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Introduction 


1.1  Overview 

The  MIT  Laboratory  for  Computer  Science  (LCS)  is  an  interdepartmental  laboratory  whose 
principal  goal  is  research  in  computer  science  and  engineering. 

In  1963,  when  the  Laboratory  was  founded  as  Project  MAC,  it  explored  and  developed  one 
of  the  world’s  earliest  timeshared  computer  systems.  This  1960’s  research  on  the  Compatible 
Time  Sharing  System  (CTSS),  and  its  successor  MULTICS,  contributed  innovations  like  the 
writing  of  operating  systems  in  high  level  programming  languages,  virtual  memory,  tree 
directories,  online  scheduling  algorithms,  line  and  page  editors,  secure  operating  systems, 
concepts  and  techniques  for  access  control,  computer  aided  design,  and  two  of  the  earliest 
computer  games — space  wa.rs  and  computer  chess. 

These  early  developments  laid  the  foundations  for  the  Laboratory’s  work  in  the  1970’s  on 
knowledge  based  systems,  for  example,  the  MACSYMA  program  for  symbohc  mathematics, 
on  natural  language  understanding;  and  in  the  development  (with  BBN)  and  use  of  packet 
networks.  In  the  1970’s,  the  Laboratory  also  developed  theoretical  results  in  complexity 
theory  and  linked  cryptography  to  computer  science  through  concepts  and  algorithms  for 
public  encryption  (RS.4).  In  the  late  197C’s,  Project  MAC,  renamed  as  the  Laboratory  for 
Computer  Science  (LCS),  embarked  on  research  in  such  areas  as  clinical  decision  making, 
in  the  exploration  of  cellular  automata  at  the  borderline  between  physics  and  computation, 
and  on  the  social  impact  of  computers.  At  the  same  time,  the  Laboratory  began  two  major 
research  programs  in  distributed  systems  and  languages,  and  in  parallel  systems.  These  led 
to  the  notions  of  data  abstractions  and  the  Clu  language;  the  Argus  distributed  system;  the 
dataflow  principle  and  associated  languages,  and  architectures  of  parallel  systems;  local  area 
ring  networks;  to  program  specification;  and  workstation  development,  where  the  Laboratory 
contributed  the  earliest  UNIX  ports  and  compilers  and  the  NuBus  architecture,  now  used  in 
commercial  computers  like  .Apple’s  Macintosh  II. 

The  Laboratory’s  current  research  falls  into  four  principal  categories.  Parallel  Systems;  Sys¬ 
tems,  Languages,  and  Networks;  Intelligent  Systems;  and  Theory.  The  principal  technical 
goals  and  expected  consequences  in  each  of  these  four  categories  are  as  follows: 

In  Parallel  Systems,  we  strive  to  harness  the  power  and  economy  of  numerous  processors 
working  on  the  same  task.  Research  in  the  area  involves  the  analysis  and  construction 
of  various  hardware  architectures  and  programming  languages  that  yield,  over  a  broad  set 
of  applications,  cost-performance  improvements  of  several  orders  of  magnitude  relative  to 
single  processors.  This  research  is  expected  to  affect  most  of  tomorrow’s  machines  which  we 
expect  to  be  of  the  multiprocessor  variety — not  only  because  of  potential  cost  performance 
benefits  but  also  because  of  the  natural,  yet  unexploited,  concurrence  that  characterizes 
contemporary  and  prospective  applications  from  business  to  sensory  computing. 

In  Systems,  Languages,  and  Networks,  our  objective  is  to  provide  the  concepts,  methods, 
and  environments  that  will  enable  heterogeneous  computers,  each  working  on  different  tasks, 
to  communicate  efficiently,  conveniently,  and  reliai)ly  with  one  another  in  order  to  exchange 
information  needed  and  supplied  by  their  respective  programs.  Such  communication  mav 


[2 


Introduction 


involve,  beyond  conventional  electronic  mail  and  file  transfer,  the  calling  of  programs  in  one 
environment  from  programs  in  another,  perhaps  different,  environment  and  the  sharing  of 
structured  data  among  such  programs.  This  research  is  also  expected  to  have  a  broad  impact 
on  future  systems,  since  virtually  every  machine  will  be  connected  to  a  network. 

Taken  together,  these  two  thrusts  in  parallel  and  networked  machines  signal  our  expectation 
that  future  computer  systems  will  consist  of  multiprocessors  interconnected  by  local  and 
long  haul  networks,  and  perhaps  someday  by  national  network  infrastructures  as  ubiquitous 
and  as  important  as  today’s  telephone  and  highway  infrastructures. 

In  the  Intelligent  Systems  area,  our  technical  goals  are  to  understand  and  construct  programs 
and  machines  that  have  greater  and  more  useful  sensory  and  cognitive  capabilities.  Examples 
include  the  understanding  of  spoken  messages,  systems  that  can  learn  from  practice  rather 
than  by  being  explicitly  programmed,  and  programs  that  reason  about  clinical  issues  and 
help  in  clinical  decision  making.  We  expect  tomorrow’s  intelligent  systems  to  be  easier  to 
use  than  today’s  programs  across  a  broad  front  of  applications. 

In  our  fourth  category  of  research,  Theory,  we  strive  to  understand  and  discover  the  fun¬ 
damental  forces,  rules,  and  limits  of  computer  science.  Theoretical  work  permeates  many 
of  our  research  efforts  in  the  other  three  areas,  for  example,  in  the  pursuit  of  parallel  al¬ 
gorithms  and  in  the  study  of  fundamental  properties  of  idealized  parallel  architectures  and 
computer  networks.  Theory  also  touches  on  several  predominantly  abstract  areas,  like  the 
logic  of  programs,  the  inherent  complexity  of  computations,  and  the  use  of  cryptography 
and  randomness  to  the  formal  characterization  of  knowledge.  The  impact  of  theoretical 
computer  science  upon  our  world  is  expected  to  continue  its  past  record  of  improving  our 
understanding  and  helping  us  pursue  new  frontiers  with  new  models,  concepts,  methods, 
and  algorithms. 

1.2  Highlights  of  the  Year 

The  year  1988  marked  the  Anniversary  of  our  Laboratory.  The  occasion  was  celebrated 
with  a  two  day  symposium  on  current  research  for  an  international  audience  of  some  1000 
people,  and  a  testimonial  banquet  attended  by  over  1200  members  and  guests  of  both  the  LCS 
and  AI  laboratories.  Chaired  by  Professoi  Albert  R.  Meyer,  the  celebration  was  memorable 
and  successful. 

Research  highlights  during  the  reporting  period  were  as  follows: 

1.  Dr.  Victor  Zue  and  f  s  research  group  moved  from  MIT’s  Research  Laboratory  of  Elec¬ 
tronics  to  LC.S.  This  move  promises  to  be  significant  for  both  the  speech  research  effort 
and  for  other  LCS  groups  through  the  potential  synergism  between  speech  research  and 
computer  architecture. 

2.  We  corrludrd  a  major  agreement  with  Motorola  to  build  the  Dataflow  Machine,  con¬ 
ceived  and  designed  by  the  Computation  Structures  Group.  This  effort  is  significant  for 
it  rei)resents  the  first  major  test  of  this  new  architecture  invented  by  our  Laboratory. 
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3.  A  new  group,  the  Computer  Architecture  Group,  was  formed.  It  includes  Professors 
Agarwal,  Dally,  and  Ward,  their  students  and  staff.  This  large  architectural  group  of 
computer  architects  plans  to  embark  on  a  new  project,  NuMesh.  The  NuMesh  involves 
an  interconnection  and  intercommunication  standard  that  goes  beyond  the  notion  of 
a  computer  bus  to  three  dimensional  structures.  As  currently  envisioned,  the  NuMesh 
will  consist  of  small  cubes  (about  2cm  on  the  side)  which  will  be  able  to  plug  together 
with  other  similar  cubes  at  all  six  of  their  sides.  Each  cube  will  contain  processing  and 
communication  chips.  We  envision  that  users  of  this  techirology  will  first  construct  in 
Tinkertoy  fashion  a  special  purpose  aggregate  that  is  best  suited  to  their  problems, 
and  will  then  run  these  problems  on  the  so-constructed  machine.  This  approach  is 
thus  expected  to  make  possible  the  benefits  of  special  purpose  computation  out  of 
general  purpose  subsystems.  We  are  planning  to  fund  this  research  out  of  an  industrial 
consortium  of  manufacturers. 


During  1988-o9,  the  Laboratory  continued  its  successful  Distinguished  Lecturer  Series  with 
presentations  by  David  L.  Parnas,  Professor  of  Computing  and  Information  Science,  Queen’s 
University;  David  S.  Johnson,  Department  Head,  Mathematical  Foundations  of  Comput¬ 
ing,  .Vr&T  Bell  Laboratories;  Raj  Reddy,  Director  of  Robotics  Institute,  Carnegie  Mellon 
University;  and  Robert  W.  Taylor,  Director,  Systems  Research  Center,  Digital  Equipment 
Corporation. 

During  this  reporting  period,  Professor  David  L.  Tennenhouse  joined  the  Advanced  Network 
.Architecture  Group;  Drs.  Gregory  Papadopoulos  and  Gill  Pratt  became  Research  Associates 
in  the  Computation  Structures  and  the  newly  formed  Computer  Architectures  Group,  re¬ 
spectively.  Dr.  Victor  Zue  and  his  Spoken  Language  Systems  Group,  including  four  research 
scientists,  15  students,  two  visitors,  and  two  support  staff  also  joined  the  Laboratory.  Two 
staff  accountants,  Ms.  Azi  Djazani  and  Mr.  David  Ruble  joined  the  LCS  administrative 
staff  and  Ms.  Mary  Mitchell  joined  MIT  and  LCS  Administrat’ve  Officer  of  the  Laboratory. 

The  Laboratory  is  organized  into  18  research  groups,  an  administrative  unit,  and  a  computer 
service  .support  unit.  The  Laboratory’s  membership  includes  a  total  of  350  people — 105 
faculty  and  research  staff,  35  visitors,  affiliates,  and  postdoctoral  associates,  30  support  staff, 
125  graduate  students,  and  undergraduate  students.  The  academic  affiliation  of  most  of 
the  Laboratory’s  faculty  and  students  is  with  the  Department  of  Electrical  Engineering 
and  Computer  .Science  (EECS).  The  funding  is  predominantly  from  the  U.S.  Government’s 
Defense  .Advanced  Research  Projects  .Agency,  which  accounts  for  about  half  of  the  total. 
The  Laboratory  is  also  funded  by  and  has  extensive  links  with  industrial  organizations. 
The‘=e  include  partnerships  for  the  construction  of  major  hardware  systems,  consortia  for  the 
development  and  maintenance  of  standards,  like  X  Windows,  and  joint  studies  on  research 
areas  of  common  concern.  Technical  re.sults  of  our  research  in  1988-89  were  disseminated 
through  publications  in  the  technical  literature,  through  Technical  Report  numbers  426 
through  453,  and  Technical  Memoranda  numbers  363  through  400. 
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Ad^unced  Network  Architecture 


2.1  Introduction 


The  Advanced  Network  Architecture  project  continues  to  explore  a  number  of  problems 
related  to  the  design  of  advanced  data  networks.  As  networks  get  bigger  and  faster,  it  is 
important  to  explore  new  design  approaches,  since  the  current  assumptions  and  protocols 
may  not  scale  well  to  match  the  expectations  of  tomorrow. 

The  central  problem  of  our  group  has  been  the  management  of  resources  within  the  network; 
bandwidth,  switching  capacity,  and  buffering.  If  we  are  to  achieve  higher  speeds  and  larger 
size,  the  tradeoffs  among  these  resources  must  change,  and  new  algorithms  and  approaches 
will  be  needed. 

In  the  following  sections,  a  number  of  specific  projects  related  to  this  ov  -all  goal  are  de¬ 
scribed. 


2.2  Network  Control  Algorithms 


Lixia  Zhang  has  nearly  completed  work  on  a  new  network  architecture  which  integrates 
resource  management  and  traffic  control  into  the  system.  The  new  architecture  can  support 
a  wide  variety  of  applications  and  incorporate  new  technologies.  It  includes  three  basic 
parts:  an  elemental  data  transmission  entity  called  a  flow^  an  interface  between  users  and 
the  network  which  allows  a  flow  to  specify  a  set  of  performance  attributes,  and  a  distributed 
control  algorithm  that  regulates  network  traffic  to  ensure  overall  performance. 


2.3  Rate-based  Flow  Control 

Previously,  several  members  of  the  group  designed  a  new  transport  protocol,  NETBLT, 
which  contained  novel  algorithms  for  flow  control  and  error  recovery  [79].  In  particular,  the 
protocol  contained  a  flow  control  algorithm  based  on  rate  regulation,  rather  than  window 
permissions.  Rate  control  is  expected  to  provide  smoother  and  more  effective  utilization  of 
high  bandwidth,  long  delay  links. 

The  first  version  of  that  protocol  did  not  have  an  algorithm  for  dynamic  adjustment  of  the 
rate,  but  instead  used  manual  adjustment.  While  manual  adjustment  was  sufficient  for  a 
first  set  ot  experiments,  it  was  not  the  basis  for  a  practical  system.  Subsequently,  Mark 
Lambert  proposed  a  number  of  dynamic  rate  adjustment  algorithms. 

Helmut  Rebstock,  a  visiting  scientist  from  Siemens  Corporation,  assisted  by  James  Davin, 
used  tlie  interactive  network  simulator  to  explore  the  behavior  of  these  proposed  algorithms 
for  dynamic  adjustment  of  transmission  rates  in  NETBLT.  Rebstock  began  a  study  of  the 
capacity  of  adaptive  variants  of  NETBLT  to  support  equitable  sharing  of  bandwidth  among 
connect  ions  in  a  congested  network. 
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Andrew  Heybey  performed  experiments  to  extend  the  results  reported  previously  in  his 
undergraduate  thesis  [161].  A  form  of  slow  start  was  added  to  Mosely’s  rate-based  protocol 
[239]  and  was  shown  (through  simulation)  to  increase  the  percentage  of  the  link  capacity 
that  can  be  used.  The  most  important  result  is  the  observation  that  the  protocol  (with  or 
without  slow  start)  operates  in  either  a  stable  or  unstable  region.  As  the  fraction  of  the 
link’s  capacity  that  the  protocol  attempts  to  use  is  increased,  the  queue  lengths  abruptly 
change  from  an  average  of  approximately  five  to  wild  oscillation  between  zero  and  several 
thousand.  (In  these  experiments,  there  is  no  limit  on  queue  length,  and  no  packets  are  ever 
dropped). 

2.4  Fair  Queueing  in  Gateways 

James  Davin  and  Andrew  Heybey  experimented  with  a  novel  “Fair  Share”  queueing  algo¬ 
rithm  developed  at  Xerox  PARC  [94].  They  simulated  the  described  algorithm  and  verified 
its  correct  operation  in  several  simple  network  topologies  both  with  and  without  the  presence 
of  ill  behaved  u.sers.  In  its  simplest  form,  the  algorithm  enforces  fairness — no  user  may  use 
more  than  its  fair  share  of  the  output  bandwidth.  It  can  also  be  used  to  enforce  policy  by 
giving  some  users  a  larger  share  of  the  bandwidth  than  others.  Because  the  present  algo¬ 
rithm  only  comes  into  play  when  the  output  queue  length  in  the  switch  is  greater  than  zero, 
the  extension  of  fair  queueing  for  operation  on  an  underutilized  link  is  being  contemplated. 
Packet  discard  strategies  are  also  being  studied. 


2.5  Protocol  Performance  Studies 


David  Clark  engaged  in  a  study  of  TCP  processing  overhead  that  strongly  suggests  that  the 
details  of  TCP  are  not  a  central  issue  in  host  level  processing.  The  results  of  this  study 
indicate  that,  with  current  RISC  processors,  the  specific  TCP  processing  steps  would  permit 
packet  transmission  at  a  large  fraction  of  a  gigabit  per  second.  This  work,  performed  jointly 
with  V.  Jacobson  at  LBL,  H.  Salwen  at  Proteon,  Inc.,  and  J.  Romkey,  is  reported  in  [82]. 

Eman  Hashem  studied  packet  clustering  in  network  environments  similar  to  those  of  the  In¬ 
ternet.  This  work  aimed  at  analyzing  the  causes  and  consequences  of  packet  aggregation  and 
its  effect  on  congestion.  Two  major  causes  were  identified:  the  TCP  slow  start  retransmission 
strategy,  and  the  interaction  of  the  gateway  and  TCP  congestion  control  mechanisms. 

The  .Jow  .start  algorithm  opens  the  TCP  sender  window  exponentially  by  incrementing  the 
window  size  by  one  for  each  acknowledgment  received.  This  behavior  leads  to  the  clustering 
of  the  packets  belonging  to  each  TCP  connection  that  is  newly  opened  or  is  recovering  from 
congestion.  This  phenomenon  is  not  very  harmful  on  its  own,  as  the  clustering  persists  only  as 
long  as  the  window  is  being  opened.  In  a  heavily  loaded  network,  however,  the  TCP  slow  start 
algoritlim  and  gateway  cong<*stion  control  schemes  interact  so  as  to  increase  the  frequency 
of  exponential  window  sizing.  The  gateway  signals  congestion  by  discarding  packets  in 
excess  of  its  capacity.  TCP  responds  to  this  signal  by  shutting  its  window  and  reopening  it 
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exponentially.  Following  recovery  from  packet  loss,  the  TCP  continues  to  open  the  window 
linearly.  Eventually,  the  previous  level  of  congestion  is  again  realized,  and  the  recovery 
process  is  repeated.  Thus,  the  network  oscillates  between  congestion  recovery  and  load 
optimization,  causing  packets  from  most  connections  to  aggregate  at  the  bottleneck  resources 
during  each  congestion  cycle.  This  global  effect,  involving  all  connections,  coupled  with  local 
clustering  of  packets  from  the  same  connection,  leads  to  a  performance  degradation.  Long 
end-to-end  delays  result  from  the  high  queueing  delays  incurred  at  the  bottleneck  resources, 
and  throughput  decreases  owing  to  bandwidth  wasted  in  retransmissions.  The  extent  of  the 
packet  clustering  effects  depends  on  the  details  of  the  congestion  schemes  and  on  how  quickly 
they  react  to  congestion.  Although  slow  start  encourages  packet  clustering,  it  minimizes 
wasted  bandwidth  by  dynamically  adjusting  its  window  size  to  the  current  network  load 
limit. 

-Another  approach  to  studying  TCP  behavior  was  pursued  by  Timothy  Shepard.  A  system 
for  collecting  and  storing  about  12  hours  of  the  protocol  headers  of  all  the  packets  on  one 
of  the  mam  Ethernets  in  the  Laboratory  has  been  built  and  is  now  in  continuous  operation. 
The  system  has  been  useful  as  a  stand-alone  aid  for  debugging  failures  of  the  operational 
network.  It  provides  easy  access  to  packet  traces  for  developers  of  experimental  systems  and 
protocols.  The  system  is  used  mainly  as  a  source  of  traces  to  support  research  in  the  analysis 
of  TCP  packet  traces. 

A  study  in  the  analysis  of  TCP  packet  traces  is  in  progress.  This  study  explores  the  graphical 
presentation  of  packet  traces  to  a  human  analyst  and  its  effect  on  the  human’s  ability  to 
absorb  and  understand  a  packet  trace. 

2.6  Random  Drop  Queue  Management 


The  congestion  control  scheme  currently  employed  in  Internet  gateways  is  a  simple  mech¬ 
anism  that  requires  no  information  about  the  conr.ections  passing  through  each  gateway. 
In  the  absence  of  such  per-connection  information,  it  is  difficult  to  give  the  connections 
accurate  signals  in  the  event  of  congestion.  One  hypothesis  advanced  within  the  Internet 
Engineering  I'ask  Force  was  that  a  slightly  more  intelligent,  statistical  mechanism  might 
afl'ord  sati.sfartory  results  without  the  need  for  per-connection  information.  This  mechanism 
was  diilibed  ‘‘random  drop”  because  it  entails  discard  of  a  randomly  chosen  packet  from 
the  botllenerk  f|ueue  whenever  congestion  is  detected.  In  theory,  the  random  drop  scheme 
discriininates  against  connections  that  consume  an  inordinate  share  of  available  bandwddth, 
for  .such  connections  are  likely  to  have  more  packets  in  the  queue  than  other  connections, 
and  the  [)robal)ility  of  packet  loss  for  any  one  connection  is  proportional  to  the  frequency  of 
it.s  representation  in  the  bottleneck  queue.  On  the  strength  of  this  analysis,  random  drop 
was  expected  to  afford  both  service  equity  and  improved  aggregate  performance. 

By  simulating  the  random  drop  mechanism,  Eman  Hashem  found  that  it  does  not  perform  as 
well  as  ex[)e(ted  and  that  performance  improvements  over  the  current  scheme  are  negligible. 
While  random  drop  can  improve  the  fairness  of  the  gatew'ay  packet  drop,  it  has  no  neces¬ 
sary  effect  u[)on  the  aggregate  distribution  of  bandwidth  in  the  network — which  is  largely 
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determined  by  the  interaction  between  gateways  and  TCP  congestion  control  schemes.  Even 
though  random  drop  penalizes  the  connections  fairly  for  causing  congestion,  it  can  not  con¬ 
trol  their  flow.  Any  connection  that  attempts  to  maximize  its  flow  is  rewarded  with  a  higher 
share  of  available  bandwidth.  For  example,  connections  with  short  end-to-end  delay  react  to 
the  gateway  congestion  signal  quickly  and  recover  faster — thus  realizing  higher  throughput 
by  squeezing  bandwidth  away  from  longer  delay  connections.  A  similar  bandwidth  advan¬ 
tage  is  realized  by  misbehaving  connections  that  use  redundant  transmissions  to  increase  the 
probability  of  data  reaching  the  destination  in  a  minimum  number  of  round  trips.  Thus,  any 
TCP  connection,  well  behaved  or  misbehaved,  that  has  the  ability  to  increase  its  flow  above 
the  other  connections  can  achieve  a  higher  bandwidth  share  even  while  employing  random 
drop.  Unfortunately,  this  behavior  also  degrades  the  performance  of  the  other  connections, 
for  they  spend  much  of  their  reduced  bandwidth  shares  recovering  from  the  congestion  caused 
by  the  aggressive  connections. 

One  variation  on  random  drop  is  to  drop  packets  with  some  small  probability  before  the 
buffer  of  the  bottleneck  resource  is  100%  full.  In  this  way,  the  connections  contributing 
most  to  congestion  are  afforded  an  early  signal  to  slow  down  before  gateway  queues  begin 
to  overflow.  This  scheme,  dubbed  “early  random  drop,”  may  represent  a  profitable  balance 
between  dropping  too  early,  effectively  reducing  the  resource’s  utilization,  and  dropping  too 
late,  achieving  no  improvement  over  the  simple  random  drop.  Early  random  drop  is  still 
under  study  with  no  significant  advantages  seen  so  far. 


2.7  Advanced  Network  Simulator 

Previously,  Andrew  Heybey  and  David  Martin,  with  help  from  other  members  of  the  group, 
developed  a  network  simulator  to  support  the  research  described  in  the  previous  sections.  The 
simulator  uses  the  X  Window  System  to  display  the  state  of  the  network  as  the  simulation  is 
running,  and  to  allow  the  user  to  use  the  mouse  to  change  the  simulation  parameters.  Data 
produced  by  the  simulation  can  also  be  logged  to  disk  for  post-processing.  The  simulator, 
by  permitting  a  visual  display  of  the  network  behavior  as  the  simulation  proceeds,  permits 
a  quick  and  intuitive  understanding  of  complex  network  behavior. 

In  this  year,  Andrew  Heybey  has  improved  the  performance  of  the  simulator  by  eliminating 
bottlenecks  in  the  simulator’s  X  Window  user  interface  code,  and  by  using  the  ability  of 
the  GNU  C  compiler  to  compile  functions  inline  to  eliminate  procedure  call  overhead  where 
possible.  A  variety  of  bugs  have  been  fixed,  and  the  improvements  have  been  released  to 
other  interested  parties,  for  whom  at  least  a  minimal  level  of  support  is  provided.  The 
simulator  is  being  actively  u.^ed  by  people  at  Washington,  Cray,  Purdue  and  Mitre. 


2.8  Network  Naming  Services 

Karen  Sollins  continued  her  work  on  providing  directory  services  in  the  Internet.  A  directory 
service  permits  the  location  and  identification  of  people,  services,  and  resources  in  order  to 
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access  them.  A  number  of  directory  services  exist,  so  the  focus  of  this  work  is  to  provide  a 
framework  for  accessing  a  directory  service  that  is  general  enough  that  most  existing  directory 
services  can  add  a  simple  access  veneer,  while  providing  a  rich  communication  protocol  for 
requesting  information  from  such  a  service.  The  framework  permits  directory  services  to 
name  each  other  as  well,  in  order  to  navigate  through  a  set  of  such  services. 

Work  began  this  year  on  a  written  plan  for  developing  and  deploying  directory  services  in 
the  Internet,  and  that  effort  nears  completion. 

Sollins  organized  a  workshop  to  discuss  white  pages  directory  services  in  the  Internet.  This 
workshop  met  h*  the  Corporation  for  National  Research  Initiatives  in  Reston,  Virginia, 
and  involved  participants  from  research  and  industry.  In  addition,  Sollins  participated  in  a 
workshop  organized  by  NASA  and  DOE  that  addressed  problems  of  naming  entailed  by  a 
transition  from  the  Internet  to  the  OSI  protocol  suite. 


2.9  Policy  Routing 

David  Clark  completed  work  on  a  proposal  for  policy  routing  in  the  Internet  [81].  An 
integral  component  of  the  Internet  protocols  is  the  routing  function,  which  determines  the 
series  of  networks  and  gateways  a  packet  will  traverse  in  passing  from  the  source  to  the 
destination.  .Although  there  have  been  a  number  of  routing  protocols  used  in  the  Internet, 
they  share  the  idea  that  one  route  should  be  selected  out  of  aU  available  routes  based  on 
minimizing  some  measure  of  the  route,  such  as  delay.  Recently,  it  has  become  important  to 
select  routes  in  order  to  restrict  the  use  of  network  resources  to  certain  classes  of  customers. 
These  considerations,  which  are  usually  described  as  resource  policies,  are  poorly  enforced 
by  the  existing  technology  in  the  Internet.  Clark  proposes  an  approach  to  integrating  policy 
controls  into  the  Internet. 

The  proposal  models  the  resources  of  the  Internet  (networks,  links,  and  gateways)  as  being 
partitioned  into  Administrative  Regions  or  ARs.  Each  AR  has  a  globally  unique  name  and 
is  governed  by  a  somewhat  autonomous  administration  having  distinct  goals  as  to  the  class 
of  customers  it  intends  to  serve,  the  qualities  of  service  it  intends  to  deliver,  and  the  means 
for  recovering  its  cost.  To  construct  a  route  across  the  Internet,  a  sequence  of  ARs  must  be 
selected  that  collectively  supply  a  path  from  the  source  to  the  destination.  This  sequence  of 
ARs  is  called  a  Policy  Route,  or  PR.  Each  AR  through  which  a  Policy  Route  passes  will  be 
concerned  that  the  PR  has  been  properly  constructed,  that  is,  each  AR  may  wish  to  insure 
that  the  ii.ser  of  the  PR  is  authorized,  the  requested  quality  of  service  is  supported,  and 
that  the  cost  of  the  service  can  be  recovered.  Before  a  PR  can  be  used,  however,  it  must  be 
reduced  to  more  concrete  terms:  a  series  of  gateways  which  connect  the  sequence  of  ARs. 
These  gatevs  ays  arc  called  Policy  Gateways. 

Clark’s  proposal  is  designed  to  permit  as  wide  a  latitude  as  possible  in  the  construction  and 
enforcemeni  of  policies.  In  particular,  no  topological  restrictions  are  assumed.  In  general, 
the  approach  is  driven  by  the  belief  that,  since  policies  reflect  human  concerns,  the  system 
si  ould  primarily  be  concerned  with  enforcement  of  policy,  rather  than  synthesis  of  policy. 
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The  proposal  permits  both  end  points  and  transit  services  to  express  and  enforce  local  policy 
concerns. 


2.10  Byzantine  Routing  Algorithms 


Most  dynamic  network  layer  routing  algorithms  depend  on  the  proper  operation  of  all  the 
routing  nodes  for  their  correct  operation.  If  one  node  is  corrupted,  and  for  example  asserts 
that  it  is  the  best  route  to  all  destinations,  most  routing  algorithms  fail  to  detect  that  this 
is  an  error. 

Radia  Perlman  completed  her  study  [258]  of  Byzantine  routing  algorithms — routing  algo¬ 
rithms  that  continue  to  operate  correctly  even  if  one  or  more  routing  nodes  are  corrupted  in 
malicious  ways. 


2.11  Network  Management 


James  Davin  has  continued  his  efforts  to  develop  the  Simple  Network  Management  Protocol 
(SNMP)  by  participation  in  the  relevant  Internet  Engineering  Task  Force  working  groups, 
by  authoring  documents  that  specify  the  protocol  [73]  [74]  [282]  and  explain  its  design  [103], 
and  by  implementation.  During  this  period,  the  MIT  SNMP  Development  Kit  software  was 
developed  and  initially  released.  This  software  is  a  highly  portable  C  language  implementa¬ 
tion  of  the  SNMP  and  has  been  ported  to  a  variety  of  platforms.  In  particular,  SNMP  was 
implemented  in  the  MIT  C  Gateway  as  part  of  this  effort. 


2.12  Internet  Architecture 


As  part  of  our  research  effort,  and  in  support  of  the  ongoing  extensions  and  changes  to  the 
Internet  protocol  suite,  members  of  the  group  participated  in  a  number  of  working  groups, 
and  contributed  a  number  of  design  papers  [80]. 

Internet  Activities  Board:  David  Clark  continued  to  chair  the  Internet  Activities  Board, 
the  steering  board  for  the  Internet  protocol  suite.  He  also  attended  the  meetings  of  several 
working  groups  and  task  forces  of  the  lAB. 

Inter-Autonomous  System  Routing  Architecture:  Lixia  Zhang  continued  participa¬ 
tion  in  the  Open-Routing  Working  Group  under  the  Internet  Engineering  Task  Force. 

Naming  Services:  As  part  of  the  work  reported  earlier  on  architectures  for  naming,  Karen 
Sollins  is  a  member  of  the  Naming  Task  Force  of  the  Distributed  Systems  Activities  Board. 
She  is  also  a  member  of  the  Autonomous  Systems  Task  Force  of  the  lAB. 
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Clinica.1  Decision  Making 


3.1  Summary 


This  year,  vve  continued  to  work  on  the  development  of  fundamental  new  methods  for  repre¬ 
senting  medical  knowledge  and  reasoning  with  it,  exploration  of  means  of  integrating  various 
results  into  coherent  research  systems,  and  formulating  methods  of  merging  decision  analytic 
and  A1  reasoning.  In  addition,  we  are  planning  to  use  part  of  this  year  to  summarize  our 
accomplishments  and  experiences  during  the  period  of  our  grant,  and  to  plan  our  future 
research. 


3.2  Plans 

3.2.1  Integration 

A  few  years  ago,  we  hypothesized  that  the  adoption  of  a  uniform  method  of  knowledge  repre¬ 
sentation  (based  on  contemporary  developments  in  AI  research)  would  give  us  an  important 
advance  in  the  ability  to  integrate  various  specific  representation  and  reasoning  techniques. 
We  pursued  this  goal  aggressively,  but  with  considerably  more  frustration  than  we  antici¬ 
pated.  We  reviewed  some  of  our  practical  difficulties  with  this  technology  late  last  year  [141] 
and  have  continued  to  explore  the  fundamental  deficiencies  in  current  AI  representation 
approaches  that  have  underlain  those  difficulties  [96]. 

During  the  next  year,  we  plan  to  attack  these  problems  by  taking  both  a  theoretical  and 
architectural  approach  to  the  problem  of  rational  self-government,  as  defined  by  Jon  Doyle. 
Architecturally,  Doyle  and  Ramesh  Patil  are  developing  the  principles  and  structures  for 
knowledge  bases  which  rationally  manage  their  knowledge  and  inference  methods.  We  ex¬ 
pect  this  organization  for  knowledge  representation  systems  to  have  many  advantages  over 
the  current  crop  of  systems.  Many  current  systems  restrict  the  expressive  power  of  their  lan¬ 
guages  in  order  to  gain  “efficiency,”  but  as  our  recent  work  has  shown,  this  sort  of  “efficiency” 
defeats  the  most  important  uses  of  these  systems.  Other  systems  are  more  expressive,  but 
are  still  limited  by  their  inference  methods,  which  apply  mainly  logical  inference  procedures 
in  ways  independent  of  the  user’s  goals.  This  also  makes  for  inefficiency,  as  these  systems 
sometimes  prevent  themselves  from  satisfying  the  user’s  needs  by  wasting  effort  on  inferences 
irrelevant  to  those  needs.  The  goal  of  this  study  is  to  develop  a  set  of  architectural  principles 
that  permit  design  and  development  of  new  representation  systems  that  explicitly  take  into 
account  the  objectives  and  constraints  that  they  are  to  satisfy. 

Complementing  this,  Doyle  and  Michael  Wellman*  are  studying  how  to  achieve  rationality  in 
the  process  of  developing  and  revising  plans  for  large  distributed  activities.  The  main  focus 
in  this  work  will  be  on  developing  techniques  for  rational  distributed  reason  maintenance. 
■Ml  current  reason  maintenance  systems  carry  out  unbounded  computations  at  each  database 
cvcle.  Though  they  save  effort  over  previous  approaches  to  belief  revision  by  only  examining 
a  j)ortion  of  the  knowledge  base  when  effecting  changes,  that  portion  may  be  much  or  even 
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all  of  the  knowledge  base,  'l  iras  these  systems  are  ill-suited  for  real  time  or  rationally  guided 
operations,  as  they  provide  no  way  for  the  user  to  control  the  effort  spent  on  a  revision 
or  the  distribution  of  effort  over  time.  The  aim  of  rational  distributed  reason  maintenance 
is  to  make  revision  computations  as  local  as  possible,  with  pursuit  of  possible  revisions 
across  distances  or  different  media  under  the  influence  of  the  goals  and  circumstances  of  the 
reasoner.  In  the  theoretical  work,  Doyle  plans  to  continue  development  of  a  mathematical 
theory  of  reason  maintenance  and  rational  self-government.  In  addition,  he  and  Elisha  Sacks' 
plan  to  investigate  the  applicability  of  more  techniques  of  modern  mathematics  to  qualitative' 
reasoning  about  physical  systems. 

In  addition  to  such  fundamental  work  on  knowledge  representation  issues,  we  are  also  plan¬ 
ning  to  develop  representation  schemes  at  a  higher,  more  specifically  medically-relevant  level 
of  detail.  Inspired  by  the  qualitative  probabilistic  network  representation  pioneered  by  Well- 
ma’i  !310],  Tze-Yun  Leong  is  developing  a  taxonomy  of  the  structural  aspects  of  a  decision 
problem.  The  intent  is  to  represent  such  concepts  as  clinical  contexts,  classes  of  therapeutic 
interventions,  causality,  and  dependency.  By  gaining  insights  into  the  structure  of  clinical 
decisions,  this  exercise  serves  as  a  step  toward  realizing  the  uniform  knowledge  representa¬ 
tion  language  for  an  integrated  artificial  intelligence  and  decision  analysis  system  for  medical 
reasoning. 

This  work  will  produce  an  appropriate  representation  for  the  formulation  of  decision  prob¬ 
lems  according  to  classical  and  recent  models  of  decision  making,  such  as  qualitative  or 
quantitative  probabilistic  networks,  influence  diagrams,  or  decision  trees.  We  plan  to  pursue 
this  work  in  the  domain  of  pulmonary  infiltrates  in  AIDS  patients,  a  field  in  which  Frank 
Sonnenberg  is  pursuing  parallel  and  somewhat  more  applied  studies.  We  plan  to  take  ad¬ 
vantage  of  his  work  and  thoughts,  using  them  as  a  basis  for  the  more  theoretical  work  to 
be  done  here.  In  particular,  in  the  coming  year  we  hope  to  have  developed  a  set  of  rep¬ 
resentation  conventions  that  are  completely  adequate  to  describe  any  modeling  issues  that 
arise  in  the  course  of  considering  an  AIDS/pulmonary  infiltrate  case.  By  looking  at  more 
clinical  cases,  the  following  complicated  issues  will  be  further  explored  and  their  implications 
on  the  proposed  reprosentatiwn  framework  analyzed:  contextual  representation,  representing 
multiple  taxonomies,  classification  along  multiple  perspectives  (in  the  same  taxonomy),  and 
the  resulting  interaction.s  among  the  concepts.  Theoretical  formalization  of  the  representa¬ 
tion  framework  will  be  attempted,  and  the  resulting  expressiveness  of  the  framework  will 
be  evaluated.  We  expect  that  the  constellation  of  issues  uncovered  here  will  generalize  to  a 
much  broader  set  of  iijedical  domains. 

3.2.2  Fundamental  Metliods 

We  are  focusing  on  a  number  of  important  fundamental  techniques  for  medical  reasoning: 
(1)  an  elegant  and  flexible  formulation  of  diagnosis,  (2)  a  powerful  method  of  temporal 
re.'tsoning  and  temporal  iudjef  inainteiiancc,  and  (.3)  a  number  of  interesting  and  critical 
approaches  to  learning  in  expert  systems. 
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Diagnostic  Reasoning 

Altliough  til*'  diagnostic  algorithms  of  most  medical  AI  systems  have  been  baseci  on  a  va¬ 
riety  ot  acl  hoc  computational  mechanisms,  recent  research  in  the  formulation  of  diagnostic 
problems  across  other  domains  of  AI  has  identified  a  more  systematic  analysis  of  the  bases 
of  diagnostic  reasoning.  Thomas  Wu  is  investigating  case  structuring,  a  sophisticated  strat¬ 
egy  for  solving  complex  medical  diagnostic  problems  in  the  broad  domain  of  general  internal 
medicine,  in  a  difficult,  multiple-disease  medical  case,  a  diagnostic  system  could  h*’!p  a  {>hysi- 
cian  greenly  by  computing  alternative  formulations  of  the  case.  Case  ‘itructunng  generates 
such  .dttvrnative  formulations.  Instead  of  evoking  disease  solutions  directly  (and  haphaz¬ 
ardly!  trom  tlie  symptoms  in  the  medical  case,  our  approach  introduces  an  intermediate 
clustering  step  to  identify  coherent  aggregates  of  symptoms.  Each  aggregate  of  symptoms  is 
caused  bv  a  st'parate  disease  and  therefore  represents  a  separate  differential  diagnostic  task 
to  be  solved,  .A.  coherent  set  of  tasks  that  explains  the  entire  case  is  called  a  task  formula¬ 
tion;  tiie  se'  of  task  formulations  constitutes  the  alternative  formulations  of  a  medical  case. 
Due  to  the  intermediate  clustering  step,  this  method  of  case  structuring  is  called  symptom 
clusi e  I'l  ng. 

Symptom  clustering  derives  its  computational  power  from  two  sources.  First,  it  exploits 
mutual  constraints  that  derive  from  the  symptoms  in  a  case.  Second,  it  takes  advantage  of 
the  dual  ob.servation  that  many  diseases  map  onto  a  few  function<al  derangements  and  that 
each  functional  derangement  maps  onto  several  co-occurring  symptoms.  These  functional 
dernngemcnt.s  -  called  syndromes  in  clinical  practice — offer  a  powerful  source  of  heuristic 
kuowh'dge  i'or  finding  the  correct  task  formulation.  In  the  past  year,  Wu  has  identified  the 
case  st  ructuring  strategy  of  medical  problem  solving,  developed  algorithms  for  the  symptom 
clusteriiig;  mctluKlology,  and  implemented  a  prototype  diagnostic  system. 

In  the  next  year,  he  plans  to  extend  the  theoretical  framework  of  case  structuring,  refine 
th*'  prctfitype  diagnostic  system,  and  evaluate  the  results.  The  extensions  will  cover  a  nuin- 
ficr  ot  Ico/ad  issue.s  in  medical  problem  solving,  including  causal  relationships,  probabilistic 
a.s-t’s.snieiir  niu!  te.st  generation  and  problem  solving  strategies. 

('ausaiit  V  in  the  current  methodology  is  very  shallow',  allowing  for  only  pathophysiological 
causality  between  diseases  and  the  symptoms  they  can  cause.  However,  diseases  may  cause 
other  di>>’asf's,  or  they  may  synergistically  enhance  or  oppose  each  other,  thereby  changing 
their  s',  ny  ioiiiatology.  These  processes  of  causal  predisposition  and  causal  interaction  can 
lie  i;;(  oi  por.ifed  into  the  symptom  clustering  methodology  by  modifying  the  know'ledge  base 
and  dia;.'::!' >■  t  ic  algorithm. 

f'rof'a  oi I:.- ;  :i  not  urns  are  missing  in  the  current  framework,  which  is  unrealistic  since  t  he  like- 
lili'jod  ol  ri'.'.e.Lo's  and  causal  influences  ranges  over  several  orders  of  magnitude.  Therefore, 
Wu  vii!  i.itroduce  |)rior  probabilities  for  diseases  and  causal  probabilities  for  associations 
tict  v.v'i'n  s',  luptoms  and  diseases.  These  probabilities  will  entail  several  considerations  for 
T''  'ur'  liu",  <  a  ri  b*'  changed  by  contextual  information,  such  as  age,  sex,  race,  occnjiation, 
aii*i  hi-Uory;  they  can  help  guide  the  search  for  plausible  task  forinulaluon.s;  and  they 

<■  HI  Imip  determine  symptoms  that  do  not  need  to  be  explainefl. 
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Test  generation  and  problem  solving  strategies  are  needed  to  make  the  diagnostic  system 
truly  interactive.  The  current  framework  receives  symptoms  passively  from  the  physician 
user;  with  test  generation  capabilities,  it.  could  actively  seek  relevant  symptom  data.  Problem 
solving  strategies  are  sequences  of  tests,  representing  the  procedural  knowledge  that  medical 
experts  use  to  solve  cases.  Wu  plans  to  incorporate  test  generation  and  problem  solving 
strategies  in  a  separate  system  to  complement  the  existing  diagnostic  system. 

A  prototype  diagnostic  system  has  been  implemented  and  tested  on  a  small  medical  knowl¬ 
edge  base.  To  fully  test  the  strengths  and  limitations  of  our  approach,  however,  we  plan  to 
use  a  large  knowledge  base  developed  by  the  INTERNIST  project.  The  form  of  the  I.NTER.NTST 
knowledge  base  is  suitable  for  this  diagnostic  algorithm.  Evaluation  of  our  approach  will 
therefore  be  based  upon  empirical  comparisons  between  case  structuring  ana  the  direct  ap¬ 
proach  of  the  INTERNIST  system. 


Temporal  Reasoning 


Thomas  Russ  is  continuing  the  development  of  the  Temporal  Control  Structure  for  creating 
expert  systems  that  use  lime  dependent  data.  The  research  has  identified  schemas  of  tempo¬ 
ral  reasoning  such  as  the  abstraction  of  patien  from  the  analysis  of  examination  and 

laboratory  results.  Support  for  usin£;  Mndsight  to  evaluate  previously  made  decisions  was 
also  provided.  A  paper  titled  ‘  lismg  Hindsight  in  Medical  Decision  Making”  was  accepted 
by  the  Symposium  on  Computer  Applications  in  Medical  Care,  to  be  held  in  October  1989 
[278].  It  will  be  a  finalist  in  the  student  pu.pe*  competition  sponsored  by  the  Symposium. 
This  programming  methodology  and  the  system  built  to  support  it  appears  especially  ef¬ 
fective  for  the  implementation  of  monitoring  and  tracking  systems.  Russ,  as  part  of  his 
doctoral  dissertation,  is  using  the  system  to  implement  a  system  for  monitoring  the  treat¬ 
ment  of  patients  with  diabetic  ketoacidosis.  We  expect  this  work  to  be  completed  b^  the 
end  of  1989. 

A  different  problem  of  temporal  reasoning  arises  in  the  Heart  Failure  (HE)  Program.  One  of 
the  problems  with  the  existing  Heart  Failure  Program  is  the  inability  to  deal  with  situations 
in  which  the  order  of  events  or  the  time  between  cause  and  effect  is  important.  Part  of 
the  reason  for  this  difficulty  is  that  the  knowledge  base  representation  of  cause  and  effect 
does  not  include  the  properties  needed  for  reasoning  about  the  time  relations.  During  the 
past  year  we  hav(?  been  developing  a  representation  for  such  time  relations  that  will  allow 
specifying  the  essential  features  vvithout  requiring  more  specificity  than  is  possible.  The  kinds 
of  distinctions  needed  include  the  time  required  to  produce  an  effect,  the  duration  of  an  effect 
after  the  cause  has  been  removed,  the  type  of  onset  {gradual  or  acute),  and  the  nature  of 
findings  (sampled  versus  symptoms  with  duration).  Since  the  probability  of  the  effect  is 
often  a  function  of  both  the  d  .ration  and  the  severity  of  the  cause,  suitable  approximations 
of  these  functions  need  to  be  jjart  of  the  knowledge  base.  We  have  been  developing  a 
representation  that  will  cajdiire  these  r<'l.ttions  allowing  reasoning  wdth  hypotheses  that  are 
clinically  distinct  even  though  they  involve  the  same  nodes.  We  used  the  representation 
for  an  example  knovvh  dgc  base  tliat  capt”res  the  important  clinical  distinctions  in  a  case 
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that  the  existing  Heart  Failure  Program  is  unable  to  handle.  We  are  currently  working  on 
strategies  for  limiting  hypotheses  to  only  those  that  are  different  in  significant  ways. 


Learning 

.•\  number  of  facicrs  have  come  together  to  suggest  the  critical  and  ever-more- widely  rec¬ 
ognized  role  of  le;iraing  in  medical  AI.  First,  it  is  widely  recognized  tliat  the  handcrafting 
of  expert  systems  is  a  diPFult  and  time  consuming  task  that  could  be  significantly  eased  if 
part  of  mode!  construction  could  be  automated.  Second,  the  increasing  availability  of  large 
bodies  '.if  relatively  carefully  collected  real  clinical  data  means  that  system  builders  need 
not  rely  on  iiuniaii  expert  judgment  as  exclusively  as  we  once  did.  VV'e  are  investigating  the 
apidicalulity  oi  sei’eral  learning  apfiroaches. 

In  the  context  of  tlie  Heart  Failure  Program,  we  are  comparing  machine  learning  approaches, 
such  as  that  of  IDd  2f)8!.  for  producing  decision  trees  to  the  logistic  regression  approach  used 
on  a  large  database  oi'  patients  presenting  with  chest  pain  [267).  We  made  arrangements 
with  Dr.s.  Sidker  am!  DWgostiiio  "267.  to  use  their  databa.se  of  .‘)773  cases  with  chest  pain 
or  shortness  of  breath  in  the  emergency  room.  This  database  has  many  clinical  attributes 
as  well  as  the  primary  final  diagnosis.  It  was  used  to  develop  a  predictive  instrument  using 
logistic  regression  analysis  to  determine  the  probability  that  a  patient  has  acute  cardiac 
ischemia.  Becau^-e  of  the  care  with  which  the  data  was  collected  and  the  large  amount  of 
data  collected  on  eacii  patient,  this  is  an  ideal  database  for  com})aring  other  technologies  to 
I  hat  of  logi.stic  regression  analysis.  Our  first  step  is  to  use  the  machine  learning  program 

ID3  to  explore  the  kinds  of  decision  trees  that  it  will  generate  with  the  same  data.  IDS 

is  represent  at : ve  oi  a  class  of  machine  learning  programs  that  inductively  generate  decision 
trees  from  exampli-s  using  statistical  tests  and  heuristics  to  keep  the  trees  small  and  limited 
to  only  those  categorizations  that  are  statistically  justified.  Since  these  technologies  have 
hec'u  enliiinc'-d  to  t'.c.ndle  no'sv  and  missing  data,  they  are  capable  e.  handling  a  database 

such  as  thi.'  uu'o  We  imve  a  'vorkiiig  implementation  of  ID3  (implemented  at  our  institution 

by  a  gr<i  imof  .b.uo'f htm  Amsierdam)  whicii  has  been  tested  on  a  number  of  small 

dat a s<:-t s.  \^  c  ar<'  i  iirr'-utly  ( liecking  the  data  and  designing  experiments  leading  up  to 
running  11)3  (ui  *he  inW'  hnimiug  set  later  in  the  summer. 

.lau'g  to  condu'U  in ve.stigations  (eventually  leading  to  her  Ph.D.  thesis)  on  how 

to  h-arn  c!!!;i'  ;ii  :;:bo  in  a  modii-al  reasoning  program  that  incorjiorates  both  associational 
and  rau-al  !■:  ao Im  I  lu  )>rimary  goad  is;  to  create  a  system  that  can  learn  from  its 
exponenoo  :  li  oai’a  o.  iic  aide  io  use  a,n  indication  of  whether  i,s  co.iclusions  were 

(  '•i  :  '  .  t  t'  a  ■'  ,  •■1'  .^:l  t,.  a;  ;i:finc  and  revise  it;s  knowledge  am!  decision  inetliods. 

I ).i  I, ;(!  A gti.i  .i  into!'!,  to  ao;. t  inne  liis  si  II (ly  of  t lie  applicability  of  formal,  algorithmic  learn- 
irni  iiieilimF  ’>.•  jU'io  luii-  1)1  iiii'dical  knowledge. 
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Intelligent  Signal  Analysis 


Patil  and  Scott  GreenwaJd  have  been  investigating  the  use  of  A1  methods  to  assist  in  the 
interpretation  of  physiologic  signals,  in  particular  the  design  and  testing  of  algorithms  that 
automatically  analyze  two  leads  of  the  electrocardiogram.  The  major  emphasis  of  this  work 
has  been  in  the  continued  development  of  CALVIN,  an  expert  system  that  exploits  con¬ 
textual  information  in  the  ECG.  CALVIN  has  been  developed  to  enhance  the  detection  of 
normal  beats  and  isolated  ventricular  premature  beats  in  the  presence  of  severe  electrode 
motion  noise  and  QRS-bke  artifact.  In  addition,  we  also  modified  ARISTOTLE,  a  traditional 
arrhythmia  detection  algorithm.  Greenwald  has  developed  and  evaluated  a  noise  detection 
strategy  using  the  ratio  of  the  number  of  peaks  to  beats  detected  within  a  three  second 
window.  A  ratio  greater  than  a  threshold  signifies  the  presence  of  severe  noise. 

In  the  last  year,  the  results  of  CALVIN’s  detection  performance  were  presented  at  the  Com¬ 
puters  in  Cardiology  Conference  (September  1988)  [138].  In  addition,  the  work  was  discussed 
at  the  Association  for  the  Advancement  of  Medical  Instrumentation  Conference  (May  1989). 
Finally,  we  plan  to  present  a  working  real  time  demonstration  of  the  combined  ARISTO¬ 
TLE/CALVIN  arrhythmia  analysis  system  on  a  MAC  II  at  the  1989  Computers  in  Cardiology 
Conference  (September  1989.) 

The  future  direction  of  this  work  is  focused  on  improving  the  detection  of  isolated  atrial 
premature  beats  (S),  isolated  premature  ventricular  beats  (V),  and  normal  beats  (N)  in  the 
presence  of  severe  noise.  The  following  projects  will  help  bring  that  goal  closer. 

ECG  Database  Development:  A  database  containing  sinus  arrhythmia  and  atrial  ectopic 
activity  needs  to  be  created  for  developing  and  testing  CALVIN’s  performance  on  atrial 
arrhythmias.  A  database  consisting  of  twenty  1/2  hour  sections  (10  for  detector  development 
and  10  for  detector  testing)  will  be  collected  for  each  of  the  following  arrhythmia  classes: 


1.  isolated  atrial  premature  beats; 

2.  atrial  couplets,  triplet.-,  runs  of  atrial  tachycardia,  and  paroxysmal  atrial  fibrillation; 

3.  mixed  isolated  atria!  premature  beats  and  isolated  ventricular  premature  beats; 

4.  mi.xed  atrial  and  ventricular  couplets,  triplets,  and  tachycardia; 
o.  real-world  noise  (primarily  electrode  motion  artifact);  and 

6.  sinus  arrhythmia  (normals  will  need  to  be  collected  from  Beth  Israel  Hospital  Holtcr 
Laboratory). 


Improving  CALVIN’s  Performance;  The  current  CALVIN  system  works  well  in  cor¬ 
recting  RISTO d’LE’s  error.-  in  classifying  normal  beats  and  isolated  premature  ventricular 
beats  in  normal  sinus  rhythiJi.  However,  it  continues  to  make  mistakes  matching  hypothet¬ 
ical  sequences  of  beats  to  the  raw  data  in  two  cases:  1)  in  regions  where  the  heart  rate  is 


31 


C'liiiicaJ  Decision  Muking 


moderately  increasing  or  decreasing  (say  over  5  or  so  beats),  and  2)  in  regions  where  the 
heart  rate  is  constant  but  where  its  estimate  of  the  heart  rate  is  inaccurate. 

One  possible  solution  to  this  problem  is  to  improve  our  estimate  of  heart  rate  (using  beats 
classified  with  high  confidence  within  a  wide  context)  and  to  “stretch”  the  hypothetical 
sequences  of  beats  in  a  nonlinear  fashion  to  account  for  heart  rate  variations.  The  following 
two  projects  would  help  us  meet  this  goal. 

One  important  question  to  answer  is  how  interbeat  intervals  vary  as  a  function  of  heart 
rate.  For  example,  normal-to-ventricular  beat  coupling  intervals  (NV  intervals)  tend  to  be 
constant  to  a  first  approximation  as  heart  rate  varies.  We  need  to  collect  statistics  on  the 
relation  of  NS,  SN,  NV,  and  VN  intervals  as  a  function  of  heart  rate  in  order  to  determine 
how  to  “stretch”  and  match  hypothetical  sequences  of  beats  for  a  particular  heart  rate. 

During  periods  of  noisy  ECG,  the  noise  level  may  drop  momentarily.  During  this  brief 
period  one  or  two  heart  beats  may  become  quite  apparent.  Physicians  frequently  use  these 
“landmark”  beats  as  fidicual  points  to  re-adjust  their  estimates  of  heart  rate  and  their 
expectations  of  beat  locations. 

We  need  to  develop  a  method  to  find  these  landmark  beats,  and  to  use  them  in  improving 
our  heart  rate  estimate  and  in  selecting  the  correct  hypothetical  sequences  of  beats.  One 
possible  confidence  measure  to  use  to  find  these  landmark  events  is  the  correlation  coefficient 
of  the  data  with  the  best  matched  beat  template.  If  the  correlation  were  above  a  conservative 
threshold,  then  our  confidence  in  the  beat’s  identity  would  be  high  enough  to  update  our 
heart  rate  estimator. 

3.2.3  Probabilities  and  AI 

As  part  of  our  strong  ongoing  interest  in  the  integration  of  probabilistic  and  artificial  in¬ 
telligence  methods,  we  are  pursuing — and  plan  to  continue  to  pursue — a  number  of  related 
issues.  Fundamental  to  all  this  work  is  the  question:  what  is  the  most  appropriate  form  in 
which  to  capture  and  represent  probabilistic  information  in  such  a  way  that  it  is  useful  to 
{diysicians? 


Computntioii.s  in  Probabilistic  Networks 

Hesrarch  over  t  he  past  few  year.s  (;ii  t  he  evaluation  of  Bayesian  probability  networks  has  led  to 
the  development  of  new  algorithms  for  handling  multiple  paths  between  nodes.  In  particular, 
tlie  approach  of  Lauritzen  and  .Spiegelhalter  [196],  while  not  circumventing  the  inherent 
ex|)onential  nature  of  the  problem,  seems  to  be  significantly  faster  than  other  published 
aftproaches.  This  approach  should  be  fast  enough  to  handle  networks  with  tens  of  nodes 
and  several  multiple  paths.  We  are  implementing  the  algorithm  to  determine  its  practical 
limitations  and  to  apply  it  to  prol)lems  of  about  that  order  of  complexity.  The  speed  of  this 
algorilhni  should  also  make  it  possible  to  test  the  heuristic  approach  we  are  using  on  larger 
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networks.  The  empirical  tests  we  conducted  o%^er  the  past  two  years  on  the  heart  failure 
program  by  entering  actual  cases  and  critiquing  the  differential  diagnoses  have  convinced 
us  that  our  current  heuristic  approach  to  the  evaluation  of  the  network  (which  has  about 
150  intermediate  nodes,  about  300  possible  terminal  nodes,  many  multiple  paths,  and  even 
forward  loops)  produces  good  hypotheses  [216].  By  inspection  we  have  not  found  better 
hypotheses  than  the  ones  produced  by  the  heuristic  method,  but  we  have  not  had  an  exact 
method  for  determining  the  best  hypotheses.  Our  implementation  of  the  Lauritzen  and 
Spiegelhalter  algorithm  will  allow'  us  to  test  the  performance  of  our  heuristic  algorithm  on 
somewhat  simplified  models.  Investigation  of  the  Lauritzen  and  Spiegelhalter  algorithm  has 
taken  place  and  we  are  beginning  the  implementation,  which  should  be  completed  shortly. 


Probabilistic  Reasoning  for  Genetic  Pedigrees 

We  plan  to  continue  a  collaborative  effort  principally  undertaken  during  the  past  year  with 
Susan  Pauker,  responsible  for  the  clinical  genetics  counseling  program  of  the  Harvard  Com¬ 
munity  Health  Plan,  and  her  genetic  counseling  colleagues.  Pauker  has  a  longstanding  inter¬ 
est  in  the  application  of  probabilistic  reasoning  methods  to  her  counseling  practice  [255][256]. 
Nomi  Harris,  as  part  of  her  Master’s  thesis  [155],  has  completed  a  program  that  employs 
the  techniques  of  Bayes  networks  [257]  to  solve  the  probabilities  of  a  consultant’s  risk  of 
abnormality  for  arbitrary  pedigrees,  including  cases  of  inbreeding.  The  program  handles 
diseases  that  may  be  dominant,  recessive  or  sex-linked,  and  has  facilities  for  dealing  with 
incomplete  penetrance,  mutation,  time-varying  likelihood  of  expression,  etc.  A  paper  de¬ 
scribing  this  work  has  been  accepted  for  the  upcoming  SCAMC  meeting  and  is  also  a  finalist 
in  the  student  paper  competition  [154]. 

We  now  ported  the  computational  core  of  that  program  to  personal  computers  (the  Macintosh 
and  a  fully  equipped  386-based  PC),  and  are  investigating  the  design  of  appropriate  user 
interfaces.  In  addition,  we  plan  to  put  this  program  into  the  hands  of  practicing  genetic- 
counselors  within  the  next  few  months  asking  them  to  perform  retrospective  evaluations 
of  some  of  their  clinical  cases  and,  based  on  feedback  from  their  experience,  outline  w'hat 
additional  capabilities  are  necessary  to  support  the  analyses  performed  by  these  counselors. 
We  also  plan  to  explore  the  adoption  of  the  Spiegelhalter  and  Lauritzen  netw-ork  algorithm 
to  speed  up  the  program  on  complex,  interbred  pedigrees. 


Extracting  Probabilistic  Information  from  Partial  Databases 

We  arc  also  using  the  Selker  and  D’Agostino  database  (described  above  under  “Learning") 
to  investigate  the  probabilities  in  the  Heart  Failure  Program.  Since  many  clinical  attributes 
of  the  patients  were  collected,  it  is  possible  to  identify  subsets  of  the  database  that  have 
properties  corresponding  to  some  of  the  intermediate  nodes  in  the  model  and  determine  the 
probability  and  confidence  intervals  for  some  of  the  findings  given  those  states.  It  is  not 
possibh-  to  do  this  for  all  of  tlic  nodes  in  tlie  model  because  not  all  of  the  data  was  collected, 
some  of  the  findings  in  the  model  do  not  exH<.tly  correspond  to  the  data  collected  in  the 
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database,  and  no  secondary  diagnoses  were  recorded  in  the  database.  Even  so,  the  data 
gives  us  an  opportunity  to  compare  the  actual  occurrence  of  some  findings  in  the  model  to 
the  probabilities  given  by  our  cardiology  experts  from  their  clinical  experience  and  knowledge 
of  the  literature.  We  are  currently  determining  the  correspondence  between  the  model  nodes 
and  the  database  attributes  and  determining  which  relations  can  be  tested. 


Decision  Tree  Critiquer 


In  the  past  year,  we  began  a  study  of  the  efficacy  of  the  decision  tree  critiquer  (DTC) 
in  detecting  structural  errors  in  decision  trees.  Recall  that  DTC  was  designed  after  first 
observing  the  kinds  of  errors  that  arose  in  trees  built  by  trainees  on  the  clinical  decision 
making  consultations  service.  In  this  ongoing  study,  initial  trees  presented  by  trainees  and 
students  are  captured  and  translated  from  the  PC  environment  to  the  symbolic  environment 
by  a  neutral  research  assistant  (a  medical  student).  The  critiquing  program  then  lists  the 
potential  structural  errors  identified  by  its  knowledge  base.  Because  of  the  annual  turnover 
of  fellows  and  more  frequent  turnover  of  students  and  visitors  to  the  service,  we  anticipated 
that  the  frequency  of  errors  identified  would  be  of  the  same  order  of  magnitude  as  was  found 
two  years  ago  when  we  initially  developed  our  error  catalog  and  began  to  code  DTC. 

To  our  surprise,  we  have  been  identifying  far  fewer  structural  errors,  suggesting  that  the 
educational  process  has  been  passed  from  trainee  to  trainee.  At  this  point  most  of  the 
“problems”  identified  by  DTC  are  not  true  errors  but  rather  represent  instances  in  which 
relationships  that  might  have  been  represented  structurally  have  been  incorporated  into 
“binding  expressions”  within  the  decision  trees.  Such  expressions  are  essentially  microcoded 
programs  that  the  current  implementation  of  DTC  cannot  parse  into  relations  among  mean¬ 
ing  concepts.  We  feel  that  this  represents  more  than  just  a  limitation  in  DTC.  When  decision 
trees  are  used  by  the  clinical  service,  they  undergo  an  extensive  debugging  process,  which 
in  fact  almost  uniformly  identifies  problems  and  necessary  refinements.  Those  problems, 
as  identified  by  human  experts  (faculty),  almost  always  lie  within  expressions  within  the 
“bindings,”  probabilities,  and  utilities  of  a  decision  tree.  This  concordance  of  error  locations 
suggests  that  our  current  classic  formalism  for  problem  representation,  which  hides  certain 
relationships  within  algebraic  expressions,  is  inadequate. 

One  ready  hypothesis  might  be  that  decision  trees  are  a  less  adequate  representation  than 
influence  diagrams  of  complex  decision  problems.  In  fact,  the  two  formalisms  appear  to  be 
complementary  representations,  with  each  making  explicit  certain  relations  that  are  “hidden” 
in  the  other  in  the  tables  of  the  influence  diagram  and  in  the  binding  structure  of  the 
derision  tree. 

In  the  next  year,  we  plan  to  allow  DTO  to  process  the  decision  trees  created  by  a  new  crop 
of  fellows  (five  new  to  our  division,  four  new  to  decision  analysis).  We  also  plan  to  process 
a  ramloni  sf'lection  of  trees  from  the  published  literature,  omitting  trees  created  by  our 
division  or  by  former  trainees.  Unfortunately,  the  current  DTC  implementation  is  incapable 
of  examining  tli«'  details  of  the  model  contained  within  mathematical  expressions  (utilities, 
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probabilities,  bindings)  and  only  critiques  the  structure  of  the  decision  tree.  We  plan  to 
expand  this  system,  developing  a  syntax  and  rules  for  examining  the  content  of  expressions 
and  the  structure  of  Markov  components.  The  examination  of  the  content  of  expressions  first 
requires  that  a  consistent  formalism  be  developed  for  representing  the  semantics  contained 
within  those  expressions,  creating  appropriate  links.  Eventually,  one  would  like  to  be  able 
to  link  those  expressions  to  a  domain-specific  knowledge  base,  but  we  will  first  explore  the 
nature  of  relations  among  variables  without  reference  to  their  medical  content. 

VVe  already  began  an  implementation  of  the  existing  DTC  in  the  same  microcomputer  envi¬ 
ronment  as  DecisionMaker,  using  the  Goldworks  systems.  We  hope  to  complete  that  imple¬ 
mentation  in  the  next  six  months  and  begin  to  examine  the  feasibility  of  its  use  concurrent 
with  tree  development. 


Microcomputer  Implementations 


Substantial  portions  of  the  decision  tree  critiquing  system  have  been  ported  to  the  IBM 
environment  in  the  Goldworks  system.  That  partial  implementation  is  functioning,  and  we 
are  exploring  mechanisms  for  allowing  it  to  interact  conveniently  with  the  evolving  Decision- 
Maker  environment. 

DecisionMaker  itself  has  been  expanded  to  include  more  transparent  representation  of  sub¬ 
trees,  and  we  have  been  exploring  alternate  visual  representations  of  the  implicit  information 
contained  within  bindings.  We  also  improved  the  scripting  mechanism  and  completed  the 
implementation  of  Monte  Carlo  simulations  within  our  microcomputer  environment. 

DecisionMaker  is  now  a  fairly  stable  product  in  a  Pascal  environment  on  the  IBM  compatible 
computer  family.  We  plan  to  explore  the  feasibility  of  a  direct  port  to  the  Macintosh  world 
using  the  Borland  Pascal  environment.  We  also  plan  to  use  the  new  Borland  object-oriented 
programming  modules  within  their  Pascal  system  to  implement  a  portion  of  the  DTC  (now 
on  Symbolics  and  Goldworks)  system,  and  provide  concurrent  advice  on  tree  construction 
and  the  interpretation  of  the  results  of  sensitivity  analyses. 


Applicntlons 

We  have  been  dcvelotiing  an  extensive  model  comparing  coronary  bypass  surgery,  percu¬ 
taneous  transluminal  angioplasty,  and  conservative  therapy  in  patients  with  angina.  That 
complex  Markov  model  required  expansion  of  the  capabilities  of  our  modeling  environment 
which  has  now  been  completed.  The  results  of  that  cost-effectiveness  analysis  are  now 
under  review.  We  also  developed  models  of  screening  for  sickle  cell  disease,  the  selective 
use  of  cy tomegab>virus  immune  globulin  in  renal  transplant  recipients,  the  expanded  use  of 
thrombolytic  therapy  in  patients  with  acute  myocardial  ischemia,  and  the  determination  of 
occupational  risk  of  HI\'  infection. 


.3.5 


Clinical  Decision  Making 

3.3  Publications 

[1]  J.  Doyle.  Constructive  belief  and  rational  representation .  Computational  Intelligence, 
5(1):1-11,  February  1989. 

i'2]  J.  Doyle.  Mental  constitutions  and  limited  rationality.  In  M.  Fehling  and  S.  Russell, 
editors,  Papers  of  the  AAAI  Symposium  on  AI  and  Limited  Rationality,  1989. 

[3]  J.  Doyle.  Reasoning,  representation,  and  rational  self-government.  In  Z.  W.  Ras, 
editor.  Methodologies  for  Intelligent  Systems,  4y  pages  367-380,  North- Holland,  1989. 

j4]  J.  Doyle  and  R.  S.  Patil.  Language  Restrictions,  Taxonomic  Classifications,  and  the 
Utility  of  Representation  Services.  Technical  Report  MIT/LCS/TM  387,  Laboratory 
for  Computer  Science,  May  1989. 

[5]  J.  Doyle  and  E.  P.  Sacks.  Stochastic  analysis  of  qualitative  dynamics.  In  N.  S.  Sridha- 
ran,  editor.  Proceedings  of  the  IP^  International  Joint  Conference  on  Artificial  Intelli¬ 
gence,  pages  1187-1192,  Morgan  Kaufmann,  1989. 

[6j  J.  Doyle  and  M.  P.  Wellman.  Impediments  to  universal  preference-based  default  theo¬ 
ries.  In  R.  J.  Brachman,  H.  J.  Levesque,  and  R.  Reiter,  editors.  Proceedings  of  the  First 
International  Conference  on  Principles  of  Knowledge  Representation  and  Reasoning, 
pages  94-102,  Morgan  Kaufmann,  May  1989. 

[7]  M.  H.  Eckrnan.  A  counterpoint  to  the  analytic  hierarchy  process.  Medical  Decision 
Making,  9:57-  58,  1989. 

|8l  M.  H.  Eckrnan,  J.  R.  Beshansky,  I.  Durand-Zaleski,  H.  J.  Levine,  and  S.  G.  Pauker. 
Peri  operative  management  of  anticoagulation  in  patients  with  prosthetic  heart  valves 
undergoing  non-cardiac  surgery.  Medical  Decision  Making,  8:328,  1988.  Presented 
at  the  Society  for  Medical  Decision  Making,  Tenth  Annual  Meeting,  Richmond,  VA, 
October  17-19,  1988. 

[9)  M.  H.  Eckrnan  and  S.  G.  Pauker.  Principles  of  diagnostic  testing.  In  W.  Kelly,  editor. 
Textbook  of  Medicine,  J.  B.  Lippincott,  1989. 

[10  L.  Ellwein  and  M.  H.  Eckrnan.  Biological  complexity  in  mathematic  modeling.  Medical 
Decision  Making,  9:38  39,  1989. 

1  1  .1.  E.  Gottlieb  and  S.  G.  Pauker.  When  is  a  positive  PPD  falsely  positive?  A  new  look 
at  old  tuberculin.  Seminars  in  Respiratory  Medicine,  10:218-225,  1989. 

12|  S.  1).  Greeriwald,  K.  S.  Patil,  and  R.  G.  Mark.  Improved  arrhythmia  detection  in  noisy 
EC'Gs  through  the  u.se  of  expert  systems.  In  Computers  in  Cardiology,  1988. 

13;  .\1.  Hagen,  K.  Meyer,  and  S.  G.  Pauker.  Human  immunodeficiency  virus  infection  in 
health  rare  workers:  a  method  for  estimating  individual  occupational  risk.  Archives  of 
Intenia.l  Medicine,  1  19:1511  -1544,  1989. 


36 


CHnicaJ  Decision  Making 


[14]  M.  D.  Hagen,  M.  H.  Eckman,  and  S.  G.  Pauker.  Aortic  aneurysm  in  a  74-year-old  man 
with  coronary  disease  and  obstructive  lung  disease:  is  double  jeopardy  enough?  Medical 
Decision  Making,  9(4):285-299,  October- December  1989. 

[15]  M.  D.  Hagen,  M.  H.  Eckman,  and  S.  G.  Pauker.  The  decision  problem  and  the  decision 
model:  choose  the  right  tool  for  the  job.  Medical  Decision  Making,  9(4):325,  October- 
December  1989.  Presented  at  the  Society  for  Medical  Decision  Making,  Eleventh  Annual 
Meeting. 

[16]  I.  J.  Haimowitz,  R.  S.  Patil,  and  P.  Szolovits.  Representing  medical  knowledge  in  a 
terminological  language  is  difficult.  In  Symposium  on  Computer  Applications  in  Medical 
Care,  pages  101-105,  1988. 

[17]  D.  E.  Hirsch,  S.  R.  Simon,  T.  Bylander,  M.  A.  Weintraub,  and  P.  Szolovits.  Using 
causal  reasoning  in  gait  analysis.  Applied  Artificial  Intelligence,  3(2-3):337-356,  1989. 

[18]  R.  L.  Jayes,  N.  S.  Hill,  and  S.  G.  Pauker.  Open  lung  biopsy  in  primary  pulmonary 
hypertension:  a  decision  analysis.  Seminars  in  Respiratory  Medicine,  10:232-241,  1989. 

[19]  J.  P.  Kassirer  and  R.  I.  Kopelman.  Cognitive  errors  in  diagnosis:  instantiation,  classi¬ 
fication,  and  consequences.  American  Journal  of  Medicine,  86:433-441,  1989. 

[20]  J.  P.  Kassirer  and  F.  A.  Sonnenberg.  Clinical  decision  analysis.  In  W.  N.  Kelley,  editor. 
Textbook  of  internal  medicine,  pages  30-34,  J.  B.  Lippincott,  1989. 

[21]  J.  P.  Kassirer  and  F.  A.  Sonnenberg.  The  scientific  basis  of  diagnosis.  In  W.  N.  Kelley, 
editor,  'T'e.rtbook  of  internal  medicine,  pages  14-16,  J.  B.  Lippincott  Co.,  1989. 

[22]  R.  I.  Kopelman,  R.  A.  McNutt,  and  S.  G.  Pauker.  Use  of  decision  analysis  in  a 
complicated  case  of  renovascular  hypertension.  Hypertension,  12:611-619,  December 
1988. 

[23]  P.  .A.  Koton.  A  medj^al  reasoning  program  that  improves  with  experience.  In  Sympo¬ 
sium  on  Computer  Applications  in  Medical  Care,  1988. 

[24]  E.  J.  Lamb,  M.  Hagen,  and  S.  G.  Pauker.  The  mean  interved  to  conception:  a  measure 
of  utility  for  the  analysis  of  decisions  involving  fertility.  American  Journal  of  Obstetrics 
and  Gynecology,  160:1470-1478,  1989. 

[25]  W.  J.  Long.  Medical  diagnosis  using  a  probabilistic  causal  network.  Applied  Artificial 
Intelligence,  3(2-3):367-' 384,  1989. 

26j  VV.  ,1.  Long,  S.  Naimi,  M.  Criscitiello,  and  G.  Larsen.  Differential  diagnosis  generation 
from  a  causal  network  with  probabilities.  In  Computers  in  Cardiology  Conference, 
IEEE,  1988. 

271  \V.  J.  Long.  8.  Naimi,  M.  G.  Criscitiello,  and  S.  Kurzrok.  Reasoning  about  therapy  from 
a  physiological  model.  Hiornedical  Measurement  Informatics  and  Control,  2(1):40  45, 
1988. 


37 


C'linicaJ  Decision  Making 

[28]  A.  J.  Moskowitz,  B.  Kuipers,  and  J.  P.  Kassirer.  Dealing  with  uncertainty,  risk  and 
tradeoffs:  a  cogniiive  science  approach.  Annals  of  Internal  Medicine,  108(3):435-449, 

1988. 

29]  A.  J.  Moskowitz  and  S.  G.  Pauker.  A  patient  with  new  Q  waves:  methods  for  decision 
making  in  the  individual  patient.  J.  Am.  Coll.  Card.,  14  (Supplement  A):29A-37A, 

1989. 

130]  S.  G.  Pauker  and  R.  I.  Kopelman.  Screening  for  renovascular  hypertension:  a  which 
hunt.  Hypertension,  14:258-260,  1989. 

31]  A.  Porath,  J.  B.  Wong,  H.  P.  Selker,  and  S.  G.  Pauker.  Thrombolytic  therapy  for 
suspected  myocardial  infarction:  a  decision  analytic  model.  In  Proceedings  of  Computers 
in  Cardiology,  IEEE  Computer  Society  Press,  1989. 

132]  E.  P.  Sacks.  Piecewise  linear  abstraction  of  intractable  dynamic  systems.  International 
Journal  of  Artificial  Intelligence  in  Engineering,  July  1988. 

33;  H.  P.  Selker,  J.  R.  Beshansky,  S.  G.  Pauker,  and  J.  P.  Kassirer.  The  epidemiology  of 
delays  in  a  teaching  hospital:  the  development  and  use  of  a  tool  that  detects  unnecessary 
hospital  days.  Medical  Care,  27:112-129,  1989. 

[34]  0.  Senyk,  R.  S.  Patil,  and  F.  A.  Sonnenberg.  Systematic  knowledge  base  design  for 
medical  diagnosis.  Applied  Artificial  Intelligence,  3(2-3):249-274,  1989. 

[35]  S.  R.  Simon,  M.  A.  Weintraub,  T.  Bylander,  D.  Hirsch,  and  P.  Szolovits.  Dr.  Gait: 
an  expert  system  for  gait  analysis.  In  RESNA  Annual  Conference,  Rehabilitation 
Engineering  Society  of  North  America,  1989. 

3(ij  F.  A.  Sonnenberg,  J.  B.  Wong,  and  S.  G.  Pauker.  Modeling  time-dependent  parameters 
with  a  variable  starting  point  in  a  Markov  cohort  simulation  with  a  limited  memory. 
Medical  Decision  Making,  8:342,  1988.  Presented  at  the  Society  for  Medical  Decision 
Making,  Tenth  Annual  Meeting,  Richmond,  VA,  October  17-19,  1988. 

37  J.  Tsevat,  I.  Durand,  and  S.  G.  Pauker.  Antibiotic  prophylaxis  for  dental  procedures  in 
patient.s  with  artificial  joints:  worthwhile,  by  the  skin  of  their  teeth.  American  Journal 
of  Public  Health,  79:739-743,  1989. 

38  J.  Tsevat.  I.  Durand-Zaleski,  D.  R.  Snydinan,  S.  G.  Pauker,  B.  G.  Werner,  and  A.  S.  Levey. 
Which  renal  transplant  patients  should  receive  cytomegalovirus  immune  globulin?  A 
cost-effectiveness  analysis.  Medical  Decision  Making,  8:344,  1988.  Presented  at  the 
Society  for  Medical  Decision  Making,  Tenth  Annual  Meeting,  Richmond,  VA,  October 
17  19,  1988. 

3!>  .1.  Tsevat,  M.  H.  Eckman,  R.  A.  McNutt,  and  S.  G.  Pauker.  Quality  of  life  considerations 
in  anticoagulant  therapy  for  patients  with  dilated  cardiomyopathy.  Medical  Decision 
Making.  8:342,  1988.  Presented  at  the  Society  for  Medical  Decision  Making,  Tenth 
Annua!  'vieeting,  Richmond,  VA,  October  17-19,  1988. 


38 


[40]  J.  Tsevat,  W.  C.  Taylor,  J.  Wong,  and  S.  G.  Pauker.  Isoniazid  for  the  tuberculin 
reactor:  take  it  or  leave  it.  American  Review  of  Respiratory  Diseases,  137:215-220, 
1988. 

[41]  M.  P.  Wellman.  Critiquing  therapy  plans  by  incremental  improvement.  In  AAAI  Spring 
Syrnposiutn  on  Artificial  Intelligence  in  Medicine,  pages  101-102,  1988.  Extended 
Abstract. 

[42]  M.  P.  Wellman.  Review  of  Perry  L.  Miller,  Expert  Critiquing  Systems.  Artificial 
Intelligence,  35:273-276,  1988. 

[43]  M.  P.  Wellman,  M.  H.  Eckman,  C.  Fleming,  S.  L.  Marshall,  F.  A.  Sonnenberg,  and 
S.  G.  Pauker.  Automated  critiquing  of  medical  decision  trees.  Medical  Decision 
Making,  9:272-284,  1989. 


Theses  in  Progress 

[1]  D.  .Aghassi.  Evaluating  Case-based  Reasoning  for  Heart  Failure  Diagnosis.  Master’s 
thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  expected 
May  1990. 

[2]  D.  Fogg.  1/  xial  Intelligence  and  Optimization  Solutions  to  Multi-criteria  Operator 
Selection.  t'hD  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Sci¬ 
ence.  expected  August  1990. 

[3]  .  Greenwald.  Improved  Detection  and  Classification  of  Arrhythmias  in  Noise-corrupted 
Electrocardiograms  Using  Contextual  Information.  PhD  thesis,  Harvard  University  and 
MIT  Division  of  Health  Sciences  and  Technology,  expected  May  1990. 

[4]  N.  Harris.  Probabilistic  Belief  Networks  for  Genetic  Counseling.  Master’s  thesis,  MIT 
Department  of  Electrical  Engineering  and  Computer  Science,  expected  May  1990. 

[5]  T-Y.  Leong.  Knowledge  Representation  for  Supporting  Decision  Model  Formulation  in 
Medicine.  Master’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer 
Science,  expected  August  1990. 

[6]  T.A.  Russ.  Reasoning  with  Time  Dependent  Data.  PhD  thesis,  MIT  Department  of 
Electrical  Engineering  and  Computer  Science,  expected  August  1990. 

[7]  r.  Wii.  Medical  Diagnostic  Pi'oblem  Solving  Using  Multiple  Types  of  Knowledge.  PhD 
thesis,  MTF  Department  of  Electrical  Engineering  and  Computer  Science,  expected 
August  1990. 

[8]  .\.  Yell.  .Automatically  Summarizing  Repetitive  Actions  and  Handling  Parameter  Un¬ 
certainty  in  Systems  at  a  Steady-state.  PhD  thesis,  MIT  Department  of  Electrical 
Engineering  and  Computer  Science,  expected  August  1990. 


39 


40 


Computer  Architecture  Group 


Academic  Staff 

A.  Agarwal  S.  Ward,  Group  Leader 

Research  Staff 

M.  Singh  II.  Zak 

R.  Jenez 


Graduate  Students 


A.  Ayers 

C.  Barclay 

T.  Bloomstcin 

D.  Chaiken 
M.  Cherian 

J.  Elsbree 
H.  Houh 

K.  Kurihara 
T.  King 

S.  Koinmrusch 


B-H.  Lim 
G.  Maa 
J.  Morrison 
J.  Nguyen 

D.  Nussbaum 
J.  Pezaris 

M.  Powell 

E.  Puckett 
C.  Selvidge 


Undergraduate  Students 

E.  Anderson  N.  Osgood 

B.  Dennis  D.  Rho 

M.  Huffman  M.  Roberts 

D.  Jedlinsky  R.  Stata 

S.  Lee  T.  Steele 

P  Pilotte  N.  Tarr 


Support  Staff 

J.  Bernard  S.  Thomas 


41 


Ciunputcr  Architecture  Clrouj) 

4.1  Introduction 


rile  1!)88-Sy  year  marks  the  end  of  the  Real  Time  Systems  Group’s  15-year  history.  RTS 
merged  with  the  groups  of  Dally,  Agarwal,  and  Knight  to  form  the  new  Computer  Architec¬ 
ture  Group,  aimed  at  integrating  the  ideas,  research,  and  resources  of  a  currently  fragmented 
community. 

The  jiast  year  has  seen  .substantial  contraction  of  the  RTS  Group,  with  the  departures  of 
Robert  Zak  and  Milan  .Minsky,  along  with  the  promotion  of  John  Pezaris  from  staff  to  stu¬ 
dent.  I’revioiisly  reported  research  involving  the  development  of  the  set-associative  DRAM 
was  broiiglit  to  an  orderly  finish.  The  L  Project  continued  as  a  focus  of  the  group’s  efforts, 
and  the  seeds  were  planted  for  a  new  effort  to  develop  a  high  performance  communication 
and  packaging  substrate  for  chip-level  digital  modules.  Each  of  these  projects  is  the  subject 
ot  a  lollowing  section. 

In  .\lewife,  research  has  focused  on  large  scale  computer  architecture  and  parallel  processing 
.'.oftware.  fhe  dicsign  of  .ALEVVIFE,  a  scalable  cache-coherent  multiprocessor,  is  the  vehicle 
for  mucli  of  our  researc  h  and  is  a  collaborative  effort  with  Tom  Knight  of  the  Artificial 
Intelligence  Laboratory.  The  ALEVVIFE  multiprocessor  will  support  multiple  models  of 
computation  including  shared-memory,  message  passing  and  the  data  parallel  model. 


4.2  The?  L  Architecture 

(’out inning  research  on  the  L  architecture  (by  Ayers,  Minsky,  Jenez,  Kommrusch,  Puckett, 
-Nguyen,  Pezaris,  Ward,  and  others)  led  to  considerable  progress  despite  relative  disinterest 
on  the  part  cR  potential  funding  agencies.  Recent  efforts  have  focused  on  the  interface  aspect 
(d  L,  promoting  its  viability  as  a  hardware-independent  virtual  machine  semantics  rather 
than  on  the  .specifics  of  any  single  hardware  implement, ation. 

4.2.1  IVIncintosh  L  Implementation 

■A  second  imidernent ation  of  L  was  made  operational  on  the  68020-based  Macintosh  II,  pro¬ 
viding  (ha  very  different  hardware  platfortn  than  our  previous  re-microcoded  Explorer,  (2) 
tin  c-valnation  btisis  for  L  implementations  on  conventional  processors,  and  (J)  an  environ¬ 
ment  for  the  d.’velcipment  of  transparent  trap-and-translate  software. 

Rtilhcr  than  'bionily  executing  L  instructions,  the  Macintosh  implementation  of  L  traps 
hen  1.  l  ode  i,-,  encountered  tind  tran.sparently  translates  it  to  native  68020  instructions, 
rim  blot  k  of  mitivi'  r  ode  i.^  (an  hed  in  loctd  storage,  avoiding  retra.nslation  of  active  program 

liioi  i  II  hw  . 

1  h<-  inijih'im'nt  aiiou  imes  Im  al  R  .AM  as  a  cache  for  active  chunks,  following  the  scheme 

de'.u'loped  b\  .\yers  in  11)87.  This  fix  al  memory  serves  both  as  a  cache  aiui  as  a  site  for 
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the  highest  volatility  level  ot  the  homogeneous  storage  model  of  L.  In  our  current  (single¬ 
processor)  Mac  implementation,  references  to  non-resi.jent  chunks  [missing  chunk  faults)  re¬ 
sult  in  access  to  a  disk-resident  external  chunk  space.  External  chunk  references  are  patched 
(by  the  fault  handler)  to  refer  to  local  names  as  each  chunk  is  imported,  avoiding  the  time 
and  cost  overheads  associated  with  conventional  memory  management  hardware.  The  cur¬ 
rent  scheme  amounts  to  chunk-based  virtual  memory  on  the  Macintosh,  although  the  code 
anticipates  sharing  of  the  external  cliunk  space  by  several  L  processors  each  having  local 
memory. 

4.2.2  Binary  Compatibility  among  Inhomogeneous  Machines 

Our  second  L  implementation  sheds  light  on  a  number  of  interesting  interface  issues.  Our 
goal  has  been  to  establish  the  illusion  of  absolute  binary  compatibility  among  dissimilar 
machines,  without  imposing  the  overhead  of  interpretive  mechanism.  We  have  been  able  to 
demonstrate  such  compatibility  using  simple  (“toy”)  programs  in  recent  months. 

Our  demonstration  involves  starting  a  small  L  program  running  on  one  system  (e.g.,  the 
Explorer);  asynchronously  interrupting  it  at  some  arbitrary  point  in  its  execution:  copying 
the  entire  network  of  chunks  representing  the  computation  (including  data,  program,  and 
program  state)  to  a  dissimilar  machine  (e.g.,  the  Macintosh);  and  continuing  execution 
without  losing  information  or  consistency.  Moreover,  the  program  runs  at  full  native-code 
speeds  on  both  machines. 

4.2.3  Types  as  Approximations 

The  L  base  language,  roughly  a  variant  of  SCHEME  with  compiler-  rather  than  interpreter- 
based  semantics,  allows  most  references  to  type  information  to  be  resolved  at  compile  time 
rather  than  incurring  runtime  overhead.  The  L  compiler  attempts  to  provide  a  general- 
ity/cost/performance  continuum  between  the  extremes  of  (say)  Lisp  and  C  by  providing  for 
runtime  types  but  attaching  a  nonzero  cost  to  their  (optional)  use,  rewarding  the  author  of 
a  C-like  L  program  by  higher  performance  than  that  of  an  equivalent  SCHEME-like  one 

In  addition  to  encouraging  explicit  type  declaration  by  the  programmer,  this  compiler  flex¬ 
ibility  places  a  premium  on  mechanisms  for  automatic  type  inference.  Work  in  this  area 
by  Nguyen  and  Ward  has  led  to  an  interesting  alternative  to  the  mechanisms  of  Milner  and 
others;  our  type  system  promises,  in  addition  to  new  inference  possibilities,  improved  type 
support  for  such  language  features  as  polymorphic  functions,  subtypes,  and  side  effects. 

Our  type  system  views  types  as  compile  time  approximations  of  runtime  values.  An  expres¬ 
sion  or  variable  may  have  a  range  of  types,  corresponding  to  varying  amounts  of  partial 
informat’on  which  can  be  inferred.  The  most  informative  type  of  a  value  is  the  value  itself, 
while  the  least  is  the  type  any  which  applies  to  all  values.  The  type  of  a  procedural  object 
is  itself  a  procedure  which  maps  types  of  the  object’s  inputs  to  types  of  its  outputs,  whence 
(again)  an  L  procedure  is  its  own  mo.st  accurate  type. 
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I  he  proposed  inference  algorithm  evaluates  type  expressions  whenever  possible  during  com¬ 
pilation.  As  a  special  case,  this  scheme  causes  complete  compile-time  evaluation  of  simple 
functional  subexpressions.  However,  this  evaluation  is  complicated  by  (1)  the  possibility 
of  non-termination  (since  the  type  language  is  Turing  universal),  and  (2)  the  presence  of 
side-effects  (since  the  evaluation  scheme  assumes  a  simple  applicative  semantics).  Under 
various  circumstances,  including  timeouts  and  assignments,  the  inference  scheme  will  re¬ 
vert  to  a  weaker  approximation  as  the  type  of  an  expression.  Thus  if  x  is  the  target  of 
two  assignments  of  types  .5  and  2.718,  or  alternatively  of  the  weaker  types  integer  a.nd  real, 
the  algorithm  will  infer  a  type  for  x  such  as  any,  number,  or  {union  5  2.718).  The  use  of 
approximations  allows  us  to  guarantee  (1)  termination  of  the  inference  algorithm,  and  (2) 
luiictionality  in  the  type  domain.  It  sacrifices,  of  course,  any  hope  of  completeness  claims 
lor  our  system:  our  compiler  will  fail  to  discover  certain  inferrable  types  if  it  is  forced  to 
discard  information  in  its  approximations. 

4.2.4  Cartesian  Network  Relative  Addressing 

Unlike  conventional  machine  architectures,  the  programming  model  for  L  imposes  no  bound 
on  the  size  ol  its  addressable  universe.  To  exploit  this  flexibility,  work  by  Morrison  has 
explored  memory  systems  based  on  an  addressing  scheme  called  Cartesian  Network- Relative 
.Addressing  (CNRA). 

The  CNR.4  architecture  attempts  to  maodmize  scalability  by  using  a  novel  addressing  tech¬ 
nique  that  provides  some  of  the  advantages  of  both  global  shared  memory  models  and  “local” 
non-shared  memory  models.  This  addressing  technique  assumes  that  the  multiprocessor  is 
built  with  a  direct  intercommunication  network.  Addresses  in  the  CNRA  system  are  com¬ 
posed  of  a  “routing”  component  and  a  “memory  location”  component.  The  routing  com¬ 
ponent  indicates  a  path  through  the  interconnection  network.  (The  origin  of  the  path  is 
the  node  on  which  the  address  resides.)  The  memory  location  component  is  the  memory 
location  to  be  addres.sed  on  the  node  indicated  by  the  routing  component. 

This  addressing  system  offers  the  unlimited  address  space  provided  by  local  non-shared 
memory  model.n.  but  allows  easy  sharing  of  data  structures  in  the  style  permitted  by  global 
shared  memory  machines.  The  thesis  discusses  how  a  practical  CNRA  system  might  be  built. 
There  are  disru.ssions  on  how  the  system  software  might  manage  the  “relative  pointers”  in 
a  clean,  transparent  way;  solutions  to  the  problem  of  testing  pointer  equality;  protocols  and 
algcmithins  lor  migrating  objects  to  maximize  communication  locality;  garbage  collection 
tecimiques:  and  other  aspects  of  the  CNRA  system  design.  It  is  clear  that  the  CNRA 
system  i.s  scalable  (in  terms  of  demonstrating  a  way  to  connect  many  processors).  However, 
vCiether  or  imt  the  system  will  work  well  will  depend  on  the  communication  behav'our  of 
large  miiltiproresscm  programs.  Since  this  is  not  yet  well  understood,  simulation  will  be 
re(juired  to  teu.  the  viability  of  the  CNRA  architecture. 
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4.3  NuMesh 

This  section  reports  very  preliminary  thoughts  on  a  proposed  new  research  effort  involving 
Ward,  Dally,  Agarwal,  Knight,  and  others.  The  idea  is  currently  in  its  fetal  stage. 

4.3.1  Introduction 

Over  the  past  two  decades,  the  backplane  bus  has  dominated  computer  architectures  as  the 
mechanism  for  intermodule  communications.  The  reasons  for  this  dominance  are  simple 
and  remain  compelling:  a  well-designed  bus  provides  a  simple,  extensible  communications 
substrate  which  allows  modules  performing  a  variety  of  computational  tasks  (and  produced 
by  a  variety  of  manufacturers)  to  be  assembled  into  coherent  systems.  It  induces  a  Tinkertoy- 
set  modularity  at  the  system  conriguration  level,  allowing  system  designers  to  systems 

without  redesigning  every  component. 

The  technical  limitations  of  buses  are  well  known,  however.  Since  they  serialize  all  system- 
level  communications,  they  constitute  a  communication  bottleneck,  and  one  whose  capacity 
remains  roughly  constant  as  the  system  size  is  increased.  Moreover,  the  timing  of  a  bus 
is  constrained  by  a  fundamental  space/time  tradeoff:  the  time  taken  by  each  transaction 
must  accommodate  the  physical  length  of  the  bus,  ensuring  a  relatively  low  communication 
bandwidth  on  all  but  trivirJly  short  buses.  Other  overhead,  such  as  the  need  to  arbitrate 
the  shared  communication  resource  among  competing  requests,  further  reduce  the  viability 
of  the  bus  as  a  basis  for  communication  in  high  performance  systems. 

The  following  paragraphs  propose  the  development  of  a  communications  substrate  which  af¬ 
fords  Tinkertoy-set  modularity  and  very  high  performance  communication  in  a  constrained 
but  interesting  range  of  applications.  The  approach  involves  standardizing  the  mechanical, 
electrical,  and  logical  interconnect  among  modules  arranged  in  a  (partially  populated)  3D 
mesh  whose  lowest  level  communications  follow  pre-compiled  systolic  patterns.  The  attrac¬ 
tiveness  of  the  scheme  derives  from  the  separation  of  its  communications  and  processing 
components,  and  the  standardization  of  the  interface  between  them.  This  decoupling  of 
computation  from  the  communication  substrate  allows  the  continued  exploitation  of  mass- 
produced  processing  elements,  e.g.,  contemporary  signal  processing  chips. 

The  goal  is  a  set  of  hardware  modules  and  support  software  which  allows  high  performance, 
special  purpose  multiprocessors  to  be  configured  for  particular  applications  in  a  matter  of 
hours.  Applicability  of  processors  so  configured  will  be  restricted  to  relatively  static,  limited- 
connectivity  algorithms  such  as  those  found  in  signal  processing,  graphics,  or  other  real  time 
applications:  this  proposal  ^toes  not  address  the  problem  of  general  purpose  multiprocess¬ 
ing  (for  which  alteinativf  proposals  aljouiid).  It  has  the  potential,  however,  of  delivering 
economical  supercuiiipiuer  power  in  an  interesting  but  limited  set  of  application  domains. 

4.3.2  Modules 

Each  component  of  our  sys‘<'m  is  a  computational  module,  perhaps  occupying  a  two-inch 
cube  (or  conceivably  much  smaller).  Each  module  contains  common  circuitry  devoted  to  low 
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level  communications  and  control  functions,  as  well  as  one  or  more  off-the-shelf  chips  which 
perform  computation.  Initially,  we  envision  modules  containing  a  modern  60-MFlop  DSP 
and  (say)  a  modicum  of  additional  memory;  eventually,  a  repertoire  of  compatible  modules 
offering  varied  functionality  (sharing  the  communications  circuitry)  might  evolve. 

The  common  circuitry  (likely  one  or  several  ASICs)  includes  communication/ control  proces¬ 
sor  and  a  number  of  receiver/  transmitter  pairs  which  drive  lines  to  neighboring  modules.  The 
communication  ports  (and  accompanying  mechanics,  connector  technology,  etc.)  allow  the 
modules  to  be  configured  into  a  limited-connectivity  network;  our  preference  is  a  6-neighbor 
3D  mesh,  although  other  topologies  are  of  course  possible.  Operation  of  the  entire  array  is 
synchronized  to  a  fast  clock,  whose  period  is  the  minimum  time  necessary  to  transfer  a  data 
word  (32  bits?)  between  adjacent  nodes.  The  limited  distances  (an  inch)  and  fixed  mechan¬ 
ics  should  allow  this  time  to  be  quite  fast,  perhaps  10  nanoseconds  or  less.  A  variety  of  other 
functions,  such  as  trimming  clock  skew,  intermodule  control,  and  processor/communications 
synchronization,  are  also  performed  by  the  comm;  n  circuitry. 

Initial  prototypes  will  undoubtedly  use  pre-packaged  off-the-shelf  chips  for  the  function- 
specific  portion  of  each  module.  Assuming  wild  success  of  the  NuMesh  and  bandwagon 
momentum  among  suppliers  of  high  performance  silicon,  however,  one  might  imagine  each 
manufacturer  packaging  and  bonding  chips  directly  into  a  pre-constructed  NuMesh  package. 

4.3.3  Communications 

Each  communications  processor  consists  of  a  simple  FSM  which  follows  a  periodic  sequence 
of  I/O  transactions.  It  has  a  small  number  of  registers,  each  of  which  is  addressable  by 
the  DSP  as  an  external  memory  location.  The  transition  table  of  the  FSM  (in  RAM)  can 
be  programmed  to  read  inputs  from  various  neighbors  into  registers  and  send  outputs  from 
various  registers  to  other  neighbors  on  each  clock  cycle.  In  the  most  ambitious  configuration, 
any  port  may  be  read  or  written  (or  perhaps  both,  since  we  presume  the  lines  to  be  unidirec- 
tional)  on  each  clock  cycle.  Implementation  considerations  may  dictate  further  restrictions; 
e.g.,  allowing  only  one  output  datum  (perhaps  to  several  destinations)  and  one  input  datum 
per  clock  cycle.  The  latter  restriction  is  suggested  by  an  implementation  involving  internal 
input  and  output  buses  interconnecting  separate  receiver/transmitter  chips  for  each  port.  A 
myriad  of  other  compromises  are  possible,  depending  on  resources  at  the  technological  level. 

The  general  idea  is  that  each  module’s  communications  FSM  be  programmed  to  follow  a 
p('riodic  pattern  of  interactions  with  neighbors.  Although  the  interactions  may  vary  among 
processor.-^,  the  periods  will  be  identical.  If  module  A  transfers  a  word  to  its  right-hand 
neighbor  B  on  clock  37  of  each  period,  then  A’s  FSM  will  be  programmed  to  drive  its  lines 
to  H  oil  that  c  lock,  while  B  will  be  programmed  to  load  in  data  from  A.  By  appropriate  design 
ot  transit  ion  tables,  arbitrary  systolic  communication  patterns  may  be  implemented  among 
processors.  In  some  cases,  words  loaded  by  a  module  are  destined  to  be  read  subsequently  by 
that  moduh-'s  DSI’;  in  other  cases,  they  are  routed  (typically  on  the  next  clock)  to  another 
neighbor  without  DSF  intervention  or  even  awareness. 

Certain  algorit  hms  may  oenefit  from  flow  control  and  other  synchronization  measures  in  their 
nnderl'ung  communications.  These  might  be  superimposed  on  the  primitive  (branch-free) 
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communication  mechanism  by  software  convention,  allowing  certain  data  words  to  contain 
control  information.  It  is  possible  that  analysis  of  potential  application  code  will  suggest  the 
addition  of  hardware  support  for  such  control  purposes. 

Additional  protocol  provisions  allow  the  communications  FSMs,  and  perhaps  the  functional 
circuitry  attached  to  them,  to  be  programmed.  This  process  is  viewed  primarily  as  a  boot¬ 
strapping  operation,  and  may  be  relatively  slow;  however,  the  potential  for  time-varying 
communication  patterns  may  eventually  be  explored. 

4.3.4  Software 

The  rapid  prototyping  of  ad  hoc  multiprocessors  depends  on  automation  of  various  aspects 
of  the  design  task,  including  (1)  design  of  the  network  topology,  (2)  allocation  of  co”'pn 
tational  tasks  to  processors,  (3)  specification  of  details  regarding  timing  and  direction  of 
communications  for  each  module/clock  pair,  and  (4)  programming  of  the  DSP.  While  steps 
(1)  and  (2)  are  the  most  chadlenging,  they  are  amenable  to  partial  solutions  (e.g.,  involving 
interaction  and  direction  from  the  designer)  and  benefit  enormously  from  the  restriction  to 
static  algorithms  with  time-  and  space-bounded  components. 

Code  generation  involves  an  accurate,  detailed  model  of  DSP  timing — perhaps  including 
cache  operation.  However,  hardware  provisions  (e.g.,  bit/register  R/W  sync  bits,  like  I- 
structures)  might  provide  some  timing  latitude  in  DSP-FSM  synchronization.  Placement 
and  routing  aspects  of  system  design — mapping  a  graph  of  time-bounded  computations  to 
a  grid  of  processors — can  probably  benefit  from  progress  in  adjacent  domains  of  algorithm 
research. 

We  emphasize  that  our  choice  of  a  3D  interconnect  topology  stems  not  from  the  3D  nature 
of  intended  applications  but  from  the  3D  nature  of  our  physical  universe.  We  anticipate  that 
the  NuMesh  interconnect  wiU  perform  well  in  any  computation  characterized  by  a  static 
sparsely-connected  communications  graph  amenable  to  efficient  embedding  in  3-space. 

4.3.5  LegoFlops  as  a  Research  Goal 

A  major  attraction  of  this  scheme,  and  variants,  is  its  promise  of  mind-boggling  performance 
in  a  limited  but  conspicuous  class  of  applications.  Unlike  more  ambitious  approaches  to  mul¬ 
tiprocessor  architecture,  commitment  of  engineering  talent,  money,  and  corollary  resources 
are  almost  certain  to  produce  splashy  results  (i.e.,  huge  MFlops  and  MFlops/parameters)  and 
very  impres.'^ive  demos  (speech,  graphics,  etc.).  By  riding  the  coattails  of  highly-engineereu 
DSPs,  we  leverage  the  real  muscle  of  the  remaining  domestic  semiconductor  industry;  indeed, 
it  is  difficult  to  imagine  TI  and  Motorola  not  vying  strongly  for  an  opportunity  to  participate 
(and  to  implant  their  respedive  DSP  chips).  The  technical  risks  are  low:  the  architectural 
schema  can  certainly  be  made'  to  work;  the  software,  while  challenging  at  its  most  ambitious 
extreme,  admits  many  clearly  practical  compromises.  Perhaps  the  biggest  challenges  involve 
the  basic  physics  of  the  system;  mechanics,  connectors,  cooling,  and  power  distribution.  Here 
there  is  unlimitefl  opportunity  for  cleverness,  optimization,  and  state-of-the-art  engineering. 
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Like  most  system-building  proposals,  this  one  is  resource  intensive:  to  achieve  its  potential, 
it  will  require  staff  engineers,  VLSI  fab,  proselytizing  and  support  of  a  user  community,  and 
the  better  part  of  a  decade  of  LCS  commitment.  However,  it  seems  to  be  an  area  where  suffi¬ 
cient  financial  commitment  virtually  guarantees  interesting  and  conspicuous  results — results 
which,  unlike  the  incestuous  tools  of  the  computer  science  community,  enable  breakthroughs 
in  real  (non-CS)  applications.  The  ultimate  attraction  of  this  commitment  to  LCS  and 
MIT  may  be  the  distinction  it  engenders  in  other  areas:  a  period  of  MIT  supremacy  in 
speech  recognition,  imaging,  dynamic  graphics,  control  robotics,  and  a  host  ct  other  client 
disciplines. 


4.4  Alewife 

The  Alewife  multiprocessor  consists  of  a  set  of  processing  nodes  interconnected  via  a  low 
latency  network.  A  high  speed  processor,  a  large  coherent  cache,  memory  and  a  cache- 
memory-network  controller  constitute  each  processing  node.  The  current  version  of  the 
network  uses  a  topology  of  the  Omega  [198]  (or  Banyan)  class  of  networks  and  is  circuit- 
switched  to  allow  low  latency  communications.  Our  research  also  addresses  scalable  fat-tree 
[203]  and  low  dimension  direct  networks  [91]  that  display  locality  and  can  provide  quick 
access  to  neighboring  memory  modules  without  requiring  a  full  network  traversal.  The 
processor,  called  APRIL  [207],  permits  rapid  context  switching  through  the  use  of  multiple 
register  files  and  includes  support  for  efficient  synchronization  and  handling  of  Futures. 

In  the  software  arena,  the  parallel  Mul-T  system  developed  at  the  Laboratory  for  Computer 
Science  in  the  Parallel  Processing  Group  is  being  adapted  for  our  use.  The  Mul-T  system 
includes  a  production  quality  compiler  for  parallel  applications.  We  have  developed  T- 
Mul-T,  an  address  tracing  system  that  produces  traces  of  parallel  applications  written  in 
Mul-T.  T-Mul-T  can  also  be  interfaced  to  a  cache-memory  system  simulator,  w'hich  in  turn 
interfaces  to  an  interconnection  network  simulator.  The  coupling  of  trace  generation  and  the 
memory  system  simulator  allows  rapid  simulation  of  various  system  configurations  without 
introducing  time  distortions  in  results;  it  also  has  speed  advantages  over  a  software  processor 
simulator,  and  does  not  incur  trace  storage  overhead.  However,  a  processor  simulator  for  our 
APRIL  processor  is  also  necessary,  because  APRIL  is  sufficiently  different  from  the  processor 
we  are  currently  tracing.  Such  a  simulator  is  currently  being  developed  to  replace  the  trace 
generation  backend  of  our  simulation  system. 

The  principal  areas  of  our  research  in  the  past  year  included: 


1.  Mult iproi  es.sor  data  collection  tools  and  techniques; 

2.  'I'he  design  of  directory  systems  for  large  scale  cache-coherent  multiprocessors; 

3.  Low  latency  interconnection  network  design  and  analysis; 

4.  Invf’',tigation  of  new  -lynchronization  techniques; 
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5.  Design  of  VLSI  processors  for  parallel  computers; 

6.  Parallel  processing  software  and  applications; 

7.  Exploiting  locality  to  enable  scaling  of  large  scale  multiprocessors;  and 

8.  Performance  modeling  and  evaluation. 


The  following  is  a  brief  description  of  each  area. 

4.4.1  Multiprocessor  Data  Collection 

Continuing  our  efforts  in  parallel  trace  data  collection,  we  now  have  a  tracer  called  T-Miil- 
T  that  generates  traces  for  parallel  symbolic  applications  and  is  written  under  Mul-T,  a 
parallel  Lisp  system  described  later.  The  first  implementation  was  done  for  the  Encore 
Multimax.  The  Mul-T  kernel  was  modified  by  David  Kranz  to  simulate  an  arbitrary  number 
of  virtual  processors,  running  on  only  a  single  processor.  The  simulation  switches  to  a 
different  processor  after  each  memory  reference  emitting  a  packet  for  each  reference.  The 
memory  allocator  was  modified  to  make  each  processor  allocate  storage  in  its  own  area  oi 
memory  so  that  we  could  study  the  effects  of  locality.  The  global  memory  allocation  used 
by  Mul-T  would  not  make  sense  in  a  large  scale  multiprocessor.  The  context  switching  and 
memory  packet  emission  is  controlled  by  having  the  compiler  insert  code  into  the  instruction 
stream.  The  resulting  simulation  is  very  fast,  only  20  times  slower  than  Mul-T  on  the  Encore 
multiprocessor  itself,  neglecting  I/O  time  for  the  memory  packets  if  they  are  to  be  written 
out  to  disk. 

T-Mul-T  does  two  things  for  us: 


1.  It  allows  us  to  run  a  real  Mul-T  program  on  an  arbitrary  number  of  processors  instead  of 
just  16  (the  number  our  Encore  machine  has).  Because  the  simulation  switches  virtual 
processors  after  every  memory  reference,  it  gives  a  faithful  simulation  of  a  possible  real 
execution  of  the  Mul  7  program,  but  it  isolates  scheduling  and  parallelism  issues  from 
the  bus  contention  problems  that  exist  in  the  Encore  machine. 

2.  It  gives  us  parallel  traces  for  real  programs  that  can  be  used  in  cache  and  memory 
network  simulations  to  help  understand  the  locality  issues. 


A  port  of  T-Mul-T  to  the  DEC  Microvax  and  the  MIPS  R2000-based  DECstation  3100 
is  also  partially  complete.  We  have  gathered  several  large  traces  of  symbolic  applications 
written  in  Mul-7"  including  MODSIM — a  functional  simulator,  BOYER — a  theorem  prover. 
and  several  other  applications. 

In  a  joint  effort  with  the  IBM  T.  .1.  Watson  Research  Center.  Mathews  Cherian  derived  large 
parallel  f’ORTRAN'  traces  using  a  “postmortem  scheduling  method”  that  can  incorporati 
multiple  synchronization  models.  In  this  technique,  a  multiprocessor  trace  is  ^x-pated  fro; 
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a  memory  reference  trace  of  the  uniprocessor  execution  of  a  parallel  application.  Using  the 
record  of  synchronization  events  contained  in  the  uniprocessor  execution  trace,  a  postproces¬ 
sor  can  schedule  tasks  from  the  uniprocessor  execution  trace  into  a  multiprocessor  trace  in 
which  the  synchronization  sections  are  simulated  assuming  some  model  of  synchronization. 
The  scheduler  simulates  processors  generating  the  requests  in  a  round-robin  fashion.  Parallel 
FORTR.4N  traces  of  several  popular  benchmarks  include  SIMPLE,  WEATHER,  and  FFT. 
We  are  using  these  traces  in  a  wide  variety  of  studies  ourselves,  and  we  plan  to  distribute  our 
trace  data  to  the  research  community  and  to  industry.  Efforts  to  trace  these  applications 
with  modified  algorithms  to  enhance  program  locality  are  also  in  progress. 

We  continued  tracing  parallel  C  applications  under  the  MACH  operating  system  using  the 
Vax  T  bit  technique  [279][l34j.  Kiyoshi  Kurihara  has  this  tracer  running  on  a  DEC  Microvax 
3200  and  is  modifying  it  to  use  MACH  threads  instead  of  UNIX  processes  to  enable  faster 
tracing. 

A  slight  modification  to  our  parallel  T-Mul-T  tracer  has  also  enabled  the  emulation  of  large 
scale  multiprocessors,  where  the  underlying  processor  on  the  machine  which  simulator  runs 
on,  substitutes  for  the  processor  in  the  multiprocessor  being  emulated.  We  have  simulators 
for  cache/directory  systems  and  interconnection  networks,  which  can  be  plugged  back  to 
back  to  provide  the  system  backend  to  the  processor  emulator.  The  FORTRAN  postmortem 
scheduler  can  also  be  used  as  the  backend  to  the  multiprocessor  emulator. 

4.4.2  Large  Scale  Cache  Coherence 

David  Chaiken  and  Mathews  Cherian  worked  on  directory  schemes  and  synchronization  in 
large  scale  multiprocessors;  earlier  studies  of  directory  schemes  were  limited  to  small  scale 
systems  of  1  to  16  processors.  We  investigated  the  scalability  of  limited  directory  schemes  [10] 
for  cache  coherence  in  the  large  scale.  In  a  limited  directory  scheme,  the  number  of  pointers 
in  the  memory  for  each  block  can  be  less  than  the  number  of  caches  in  the  system.  Such  a 
scheme  works  because  of  temporal  locality  property  of  processors  referencing  a  given  memory 
block. 

Mathews  C’herian  wrote  a  cache  and  directory  simulator  suitable  for  simulating  large  scale 
systems  that  provide  several  statistics  on  cache  coherence  schemes  such  as  the  effects  of  cache 
size,  block  size,  number  of  directory  pointers,  and  the  number  of  processors.  The  simulator 
also  provides  traffi-'  rates  of  schemes  that  do  not  cache  shared  variables,  and  can  distinguish 
betweem  shared  and  private  traffic,  synchronization  and  non-synchronization  traffic.  On 
the  slate  for  the  future  is  obtaining  statistics  on  a  pointer-chaining  directory  based  cache 
coherence  scheme  '76j. 

David  Chaiken  extended  the  work  on  directory  schemes  and  wrote  a  simulator  to  reflect  more 
details  of  the  cache  coherency  protocol.  This  program  can  simulate  a  fuUy- acknowledged 
jjrotocol  that  guarantees  sequential  consistency,  correctly  interacting  with  a  rapid-context- 
switching  processor.  The  simulator  is  being  extended  to  handle  a  weak-coherence  protocol 
that  fields  processor-issued  fence  instructions  and  outstanding  memory  operations,  and  to 
handle  access  to  lull  empty  bit  synchronization  requests. 
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This  program  is  intended  to  be  part  of  a  processor/cache/memory/network  simulator  that 
can  be  used  to  perform  a  detailed  analysis  of  the  architecture  that  we  are  proposing. 

Results  from  simulations  of  parallel  FORTRAN  applications  show  that  these  limited  di¬ 
rectory  schemes  do  scale  for  some  applications.  But  for  other  applications,  wddely  shared 
synchronization  objects  and  sharing  of  words  within  cache  blocks  reduce  their  performance 
to  almost  that  of  a  scheme  that  does  not  cache  shared  objects.  Consequently,  our  current 
focus  is  on  methods  for  restructuring  parallel  programs  to  exploit  caches.  Our  results  for 
applications  written  in  Mul-T  showed  relatively  much  better  performance  due  to  the  lack  of 
widespread  sharing  prevalent  in  the  FORTRAN  applications. 

A  major  observation  was  that  synchronization  references  are  another  impediment  to  scala¬ 
bility.  Because  a  large  number  of  processors  simultaneously  access  synchronization  variables, 
excess  traffic  to  a  single  hot-spot  location  results.  Large  scale,  cache-coherent  multiproces¬ 
sors  suffer  significant  amounts  of  invalidation  traiffic  due  to  such  synchronization  reference 
patterns.  Large  multiprocessors  that  do  not  cache  synchronization  variables  are  often  more 
severely  impacted.  If  this  synchronization  traffic  is  not  reduced  or  managed  adequately, 
synchronization  references  can  cause  severe  congestion  in  the  network.  We  are  investigating 
new  scalable  synchronization  methods  that  do  not  incur  excessive  hardware  cost  [6].  A  later 
section  will  describe  our  work  that  addresses  this  issue. 

As  a  sampling  of  our  results  on  the  performance  of  directory  schemes  for  cache  coherence. 
Figure  4.4.2  shows  an  invalidation  histogram  for  a  64-processor  simulation  of  DirffNB  driven 
by  a  trace  from  the  SIMPLE  application.  Dir^N B  corresponds  to  a  directory  scheme  that 
uses  no  broadcasts  and  has  N  pointers.  The  graph  shows  the  histogram  of  the  number 
of  invalidations  required  during  a  write  to  a  previously  clean  block.  The  graph  shows  the 
percentage  of  writes  which  resulted  in  invalidations  to  up  to  12  caches.  Writes  resulting  in 
invalidations  of  greater  numbers  of  caches  were  proportionately  insignificant.  In  over  95% 
of  the  times  that  an  invalidation  occurred,  a  block  had  to  be  invalidated  from  no  more 
than  three  caches.  This  small  number  of  invalidation.'  compared  to  the  possible  maximum 
(i.e.,  64)  implies  that  for  the  common  case  the  directory  need  have  just  a  few  pointers  to 
encode  the  locations  of  the  shared  blocks.  Invalidation  histograms  for  FFT  and  WEATHER 
had  a  corresponding  figure  of  over  99%.  .Synchronization  references  accounted  for  all  the 
invalidations  involving  roughly  10  or  more  caches — a  definite  problem. 

On  a  further  investigation  of  the  scalability  of  cache  coherence  schemes,  we  observed  that 
with  large  block  sizes,  concurrent  accesses  of  a  block  by  several  processors  caused  excess 
invalidation  traffic  For  example,  halving  the  block  size  to  8  bytes  from  16  almost  halved  the 
network  request  rate.  Our  current  efforts  are  aimed  at  compiler  techniques  to  make  caches 
viable  by  r<'ducing  this  interference  effect. 

David  Chaiken  is  working  on  semantic  models  for  shared  memory.  Ongoing  research  includes 
proving  that  our  directory  scheme  implementation  does  conform  to  our  definition  of  caclu 
coherence  [75).  The  design  of  a  cache-directory  and  network  communications  controller,  to  be 
used  in  a  large  scale  multiprocessor,  is  in  progress.  The  chief  issues  being  addressed  are:  the 
programmability  and  the  imiiiementation  efficiency  of  various  shared-memory  programminr 
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Figure  4.1:  Cache  invalidation  statistics  for  SIMPLE  with  64  processors.  The  height  of  a  bar 
at  X  reflects  the  fraction  of  write  hits  to  previously  clean  blocks  that  resulted  in  x  invalidation 
messages. _ 

paradigms,  such  as  strong  serialization  versus  weak  ordering,  supporting  full-empty  bits  in 
the  cache/memory  controller,  and  tradeoffs  in  controller  design  to  support  context  switching, 
such  as  re-issuing  instructions  versus  pipeline  freezing. 

4.4.3  Interconnection  Networks 

We  analyzed  interconnection  network  architectures  that  can  best  exploit  the  lower  average 
traffic  intensity  of  cache-coherent  systems.  Analytical  evaluations  with  packet-switched  and 
circuit-switched  networks,  assuming  similar  speeds  for  the  switch  nodes,  show  that  circuit¬ 
switching  can  be  superior  to  packet-switching  in  the  medium  scale  (256-1000  processors). 
Our  simulations  with  the  parallel  FORTRAN  traces  also  indicate  that  directories  yield  better 
processor  utilization  than  a  scheme  that  does  not  cache  shared  data.  The  relative  advantage 
of  caching  can  be  further  enhanced  by  clever  program  restructuring  to  exploit  fast  access  to 
cached  data. 

Figure  l.  l.il  shows  the  processor  utilization  for  packet-  and  circuit-switched  networks  for  a 
limited  directory  scheme  with  four  pointers  and  a  scheme  that  does  not  cache  shared  data 
for  the  FFT  application.  We  see  that  circuit  switching  networks  yield  better  performance 
up  to  about  a  thousand  processors.  Directory  schemes  are  also  superior  to  the  non-caching 
schemes,  although  not  by  much.  As  mentioned  earlier,  we  are  investigating  methods  to 
improve  Ixmefits  of  large  coherent  caches. 
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Figure  4.2:  Processor  utilization  for  packet  and  circuit  switched  networks  for  a  limited 
directory  scheme  with  four  pointers  and  a  scheme  that  does  not  cache  shared  data.  The 
application  is  FFT. _ _ _ 

Gino  Maa  has  written  a  fully-configurable  circuit-switched  interconnection  network  simulator 
which  is  being  used  to  model  the  performance  of  proposed  machine  architectures  and  to 
validate  the  effectiveness  of  our  analytical  performance  models.  It  allows  us  to  evaluate 
the  impact  of  alternative  processor  architectures,  cache  and  coherence  protocol  designs, 
and  network  topologies.  A  packet-switched  version,  being  written  by  Sue  Lee  and  Gino 
Maa  will  be  operational  presently,  so  that  tradeoffs  betw'een  circuit-  and  packet-switching 
can  be  examined  via  simulation  under  various  system  configurations.  The  simulator  works 
with  both  live  and  static  frontends:  with  live  sources  such  as  an  instruction-set  interpreter, 
block/resume  handshaking  signals  are  generated  at  the  interface  to  exert  back  pressure  on 
the  execution  and  scheduling  behavior  of  the  processors.  With  static  trace  inputs,  trace 
skew  statistics  is  collected  to  provide  a  qualitative  confidence  measure  on  the  integrity  of  the 
simulation  data.  In  either  mode,  the  network  input  traffic  may  be  “pre-filtered”  via  a  memory 
cache  simulator.  With  the  simulators,  we  are  already  collecting  useful  data  which  is  guiding 
and  confirming  our  design  choices.  One  observation  we  made  with  our  simulation  thus  far 
is  that  purely  static  backends  that  drive  network  simulations  can  be  inaccurate.  Measured 
maximum  skews  between  the  trace  generated  streams  of  various  processors  and  those  during 
network  simulations  were  over  a  million  references  for  a  total  simulation  reference  length  of 
20  million  references! 

A  VLSI  implementation  of  a  circuit-switched  network  chip  is  in  progress  under  the  direction 
of  Tom  Knight  [101].  Knight  is  also  investigating  high-density  3-D  button-board  packaging 
for  the  interconnection  network. 
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4.4.4  Synchronization 

Wc  developed  a  new  technique  for  efficient  synchronization  called  adaptive  backoff  syn¬ 
chronization  [6j.  A  purely  software  approach,  adaptive  backoff  synchronization  helps  re¬ 
duce  netw'ork  contention  due  to  fine-grain  synchronization  accesses  across  a  network.  Our 
technique  can  also  help  reduce  hot-spot  contention  in  large  scale  networks  without  resort¬ 
ing  to  hardware-intensive  solutions  like  combining  networks  [261]  or  global  synchronization 
logic  [164].  We  are  also  investigating  the  application  of  these  adaptive  backoff  schemes  to 
reduce  contention  in  source-responsible  circuit-switched  networks. 

Our  adaptive  backoff  methods  use  a  synchronization  state  to  reduce  polling  of  synchroniza¬ 
tion  variables.  Our  simulations  show  that  when  the  number  of  processors  participating  in  a 
barrier  synchronization  is  small  compared  to  the  time  of  arrival  of  the  processors,  reductions 
2C./C  to  over  95%  in  synchronization  traffic  can  be  achieved  at  no  extra  cost.  Ir\  other 
situations,  adaptive  backoff  techniques  result  in  a  tradeoff  between  reduced  network  accesses 
and  increased  processor  idle  time. 

We  are  also  studying  software  combining  [316]  to  determine  the  extent  to  which  a  directory 
cache  coherence  scheme  can  efficiently  support  fine-grain  barrier  synchronization.  By  using 
the  postmortem  scheduler  for  FORTRAN  traces,  along  with  some  additional  postprocessing 
software  to  simulate  the  effect  of  software  barrier  trees,  Kiyoshi  Kurihara  is  investigating 
methods  to  reduce  synchronization  costs  in  cache- coherent  multiprocessors.  The  postpro¬ 
cessing  program  locates  and  changes  spin-lock  addresses  to  simulate  a  combining  tree  effect. 
-Applications  to  both  static  and  dynamically  created  barriers  are  being  studied.  To  obtain 
results  from  the  MACH  tracing  package,  Kiyoshi  is  modifying  the  barrier  macros  to  use 
combining  trees  and  adaptive  backoff  methods. 

4.4.5  Processor  Design 

We  are  investigating  novel  VLSI  processor  architectures  for  large  scale  multiprocessor  sys¬ 
tems.  \  processor  called  ALRIL  is  being  designed  by  Beng-Hong  Lim  and  Dan  Nuss- 
baum  [207;.  This  processor  borrows  heavily  from  the  MARCH  processor  design  by  Bert 
Halstead  and  the  Stanford  MIPS-X  processor  [163],  but  differs  substantially  from  the  two. 
Unlike  MARCH,  APRIL  has  hardware  interlocks  in  the  pipeline,  does  not  interleave  process 
threads,  and  u.ses  software  thread  scheduling.  Unlike  MIPS-X,  it  allows  multiple  hardware 
contexts,  and  has  hardware  support  for  synchronization  and  Futures.  The  chief  issues  being 
addressed  in  this  design  are  rapid  context  switching,  fast  trap  handling,  high  single  thread 
performance,  hardware  support  for  synchronization  and  futures,  and  register  file  organiza¬ 
tion. 

An  inipcjrtant  result  of  our  study  has  been  identifying  the  specific  hardw'are-software  trade¬ 
offs  for  achieving  overall  high  system  performance.  Some  examples  include  hardware  versus 
software  for  fine-grain  task  management  and  scheduling  in  a  multithreaded  processor,  and 
hardware  provided  synchronization  primitives  such  as  fetch-and-op  versus  software  synthe¬ 
sized  primitive.s  from  basic  interlocked  load/store  instructions.  We  currently  have  a  pre¬ 
liminary  instruction  set  specification.  A  Mul-T  compiler  for  this  processor  and  a  detailed 
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simulator  are  also  being  written.  Beng-Hong  Lim  has  written  an  instruction-level  simulator 
that  recently  ran  the  ubiquitous  Fibonacci  program. 

4.4.6  Parallel  Processing  Software 

David  Kranz’s  work  centered  around  Mul-T  [189].  Mul-T  is  a  parallel  Lisp  system,  based 
on  Multilisp’s  future  construct  [151],  that  was  developed  to  run  on  an  Encore  Multimax 
multiprocessor.  Mul-T  is  an  extended  version  of  the  Yale  T  system  [270] [271]  and  uses  the 
T  system’s  ORBIT  compiler  [188]  to  achieve  “production  quality’’  performance  on  stock 
hardware — about  100  times  faster  than  Multilisp.  Mul-T  shows  that  Futures  can  be  imple¬ 
mented  cheaply  enough  to  be  useful  in  a  production-quality  system.  Mul-T  is  fully  opera¬ 
tional,  including  a  user  interface  that  supports  managing  groups  of  parallel  tasks.  People 
at  other  universities,  labs,  and  companies  are  using  Mul-T,  and  useful  feedback  is  expected. 
(See  [189]  and  the  Parallel  Processing  Group  report  for  more  details.) 

Mul-T  is  useful  as  a  real  system  for  parallel  programming  but  suffers  because  it  is  difficult 
to  do  performance  evaluation.  We  also  do  not  want  to  limit  ourselves  to  bus-based  multi¬ 
processors  such  as  the  Encore  Multimax.  For  large  scale  multiprocessors  it  will  be  necessary 
to  examine  the  effects  of  locality  on  performance.  In  order  to  get  the  data  necessary  ti) 
investigate  these  issues,  David  Kranz  re-engineered  Mul-T  to  get  T-Mul-T  described  earlier 
T-Mul-T  runs  on  an  arbitrary  number  of  processors  independent  of  the  number  available  in 
the  host  multiprocessor. 

In  collaboration  with  Susan  Owicki,  DEC  Systems  Research  Laboratory,  Palo  Alto,  we  arc 
investigating  affinity-based  process  scheduling  techniques  for  improving  the  locality  of  mem¬ 
ory  referencing  in  multiprocessors.  This  work  uses  analytical  models  of  the  performance 
of  multiprogrammed  single  processor  caches  [8].  An  analytical  model  of  performance  of 
multiprocessor  caches  has  also  been  derived  to  be  used  in  this  study  [251]. 

A  continuing  effort  is  the  development  of  large  parallel  applications.  A  substantial  bench¬ 
mark  program,  SIMPLE,  is  being  parallelized  and  ported  to  Mul-T,  a  parallel  dialect  of  Lisp. 
It  is  a  finite-difference  numerical  analysis  program  from  the  Lawrence  Livermore  Lab,  which 
has  become  one  of  the  standard  benchmarks  for  evaluating  existing  and  proposed  high  per¬ 
formance  computers.  The  parallelization  can  be  conditionally  compiled  to  efficiently  target 
a  wide  scale  of  multiprocessors  (from  16  to  lOO’s  of  processors.)  Other  programs  already 
developed  for  Mul-T  include  parallel  matrix  multiply,  Permute,  and  Modsim. 

4.4.7  Multiprocessor  Locality  Studies 

Caches  can  prove  beneficial  in  large  scale  multiprocessor  environments  only  if  we  can  exploit 
locality  in  multiprocessor  memory  referencing  to  a  much  greater  extent  than  we  have  been 
able  thus  far.  Our  measurement  studies  of  parallel  application  traces  confirm  this  need.  Our 
efforts  in  this  direction  are  summarized  next. 

Ongoing  work  aims  at  providing  an  integrated  strategy  to  implement  an  efficient  storage 
hierarchy  in  shared-memory  multiprocessors.  Currently,  parallel  traces  and  application  pro 
grams  are  being  used,  along  with  the  simulation  tools,  to  analyze  and  chara  'tf'rize  locality 
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properties  in  the  memory  access  traffic.  This  is  preparatory  work  with  the  goal  to  explore 
ways  to  exploit  memory  access  locality  to  reduce  access  latency  and  interconnection  band¬ 
width  requirements. 

We  have  a  new  model  representing  memory  referencing  locality  in  multiprocessor  systems  [7j. 
This  locality  model  suitable  for  multiprocessor  cache  evaluation  is  derived  by  viewing  mem¬ 
ory  references  as  streams  of  processor  identifiers  directed  at  specific  cache y  blocks. 
I'his  viewpoint  differs  from  the  traditional  uniprocessor  approach  thn'-  uses  streams  of  ad¬ 
dresses  to  different  blocks  emanating  from  specific  processors.  Our  view  is  based  on  the 
intuition  that  cache  coherence  traffic  in  multiprocessors  is  largely  determined  by  the  number 
of  processors  accessing  a  location,  the  frequency  with  which  they  access  the  location,  and  the 
sequence  in  wffiich  their  accesses  occur.  The  specific  locations  accessed  by  each  processor, 
the  time  order  of  access  to  different  locations,  and  the  size  of  the  working  set  play  a  smaller 
role  m  determining  the  cache  coherence  traffic,  although  they  still  influence  intrinsic  cache 
performance.  Gino  fVIaa  has  some  initial  results  that  show  that  these  processor  references 
directed  to  a  memory  block  display  the  LRU  stack  property.  If  we  succeed  in  showing  this 
is  indeed  true  across  a  large  set  of  parallel  applications,  then  the  abundant  literature  on 
LRU  stack  evaluation  for  single  processors  can  be  straightforwardly  used  in  evaluation  of 
multiprocessor  performance. 

4.4.8  Multiprocessor  Performance  Modeling  and  Evaluation 

Analytical  models  of  computer  performance  become  ever  more  important  as  we  scale  mul¬ 
tiprocessors  to  hundreds  or  thousands  of  processors,  where  the  computational  needs  of  sim¬ 
ulations  far  exceed  those  available  to  us  now.  In  addition  to  the  simulation  and  analytical 
modeling  systems  described  in  the  previous  sections,  we  developed  the  following  performance 
evaluation  models. 

We  developed  a  model  of  multiprocessor  cache  performance  when  coherence  is  enforced  by  the 
software  [2.51].  A  similar  model  for  the  performance  of  hardware-enforced  cache  coherence 
that  takes  into  account  the  effects  of  the  increase  in  invalidations  as  more  processors  are 
added  is  being  developed  in  collaboration  with  Susan  Owicki. 

We  have  implemented  several  analytical  models  of  network  performance  to  predict  effec¬ 
tive  proce.ssor  utilization  taking  into  account  delays  due  to  cache  and  network  accesses  and 
contention,  d  he  models  are  driven  with  access  rates  and  access  sizes  measured  from  our 
benchmark  traces.  These  models  allow  quick  estimation  of  performance  measures  for  vari¬ 
ous  m'twork  configurations  and  numbers  of  processors. 

Minor  Huffman  extended  our  trace  compaction  technique  based  on  a  model  of  the  spatial  lo¬ 
cality  in  program.s  [5j  to  improve  both  the  compaction  rate  and  cache  performance  simulation 
accuracy  [  1651. 
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5.1  Introduction  and  Overview 

Our  group  is  interested  in  general  purpose  parallel  computation.  Our  approach  is  centered 
on: 


•  Declarative,  implicitly  parallel  languages. 

•  Dataflow  architectures,  which  are  scalable  because  of  their  tolerance  of  increased  mem¬ 
ory  latencies  and  support  for  frequent  synchronization.  Our  vehicles  for  research  in¬ 
clude  an  abstract  “Explicit  Token  Store”  architecture  (ETS),  a  hardware  prototype 
implementation  of  ETS  (Monsoon),  various  software  emulators  (Gita,  MINT),  a  new 
proposed  architecture  called  P-RISC,  and  a  software  emulator  for  it. 

•  Sophisticated  compiling  and  runtime  systems  for  Id,  both  for  dataflow  and  other  ar¬ 
chitectures.  VVe  have  also  explored  the  use  of  dataflow  compiling  for  an  experimental 
persistent  programming  language  to  tolerate  disk  latencies  by  exploiting  parallelism. 

•  Applications  programs  to  guide  the  language,  compiler,  and  architecture  research. 


Our  main  research  vehicle  for  programming  languages  is  Id,  which  is  a  mostly  functional 
programming  language.  We  completed  the  basic  type  system  and  are  exploring  the  use  of  a 
simplified  version  of  a  new  overloading  mechanism  due  to  Phil  Wadler  [306].  Id  is  a  nonstrict 
language  for  more  parallelism,  but  nonstrictness  is  not  achieved  via  laziness,  as  is  usually 
the  case.  Instead,  we  have  explored  the  implications  of  using  explicit  constructs  for  lazy 
evaluation  to  deal  with  infinite  structures.  For  nondeterministic  access  to  shared  state,  we 
have  developed  a  new  construct  called  a  “manager”  that  is  similar  to,  but  more  flexible  than 
monitors  and  also  allows  more  concurrency.  We  have  also  explored  a  few  other  experimental 
language  designs:  a  language  with  naming  environments  as  first  class  objects,  and  a  language 
for  signal  processing.  Our  group  is  well  represented  in  the  international  committee  that  is 
designing  the  new  functional  programming  language  Haskell. 

On  the  more  theoretical  side,  we  have  formalized  Id’s  operational  semantics  using  rewrite 
rules,  and  have  been  able  to  prove  results  about  determinacy  and  to  be  more  precise  about 
such  concepts  as  termination,  errors,  etc.  We  have  also  studied  optimal  interpreters  for  the 
lambda-calculus. 

We  have  ported  a  subset  of  Id  World,  our  programming  environment  for  Id,  to  the  UNIX 
eri vi . oiiment .  This  should  make  Id  available  to  a  much  larger  audience.  The  UNIX  version 
l.icks  the  graphics  of  the  original  Lisp  Machine  version;  this  work  remains  to  be  done. 

Last  year,  we  reported  that  our  research  results  had  reached  a  level  of  maturity  where  we 
were  rearly  to  embark  on  the  construction  of  a  real  dataflow  machine  within  the  next  few 
years  using  the  Monsoon  processor  architecture  (Project  Dataflow).  Towards  that  end,  we 
held  a  meeting  in  March  1988  with  prospective  industrial  partners.  Over  the  last  year, 
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Motorola  has  emerged  as  our  partner;  they  are  setting  up  a  Cambridge  research  laboratory, 
and  will  participate  actively  in  the  construction  of  the  Monsoon  system. 

A  wire-wrap  prototype  of  a  processor  using  the  Monsoon  dataflow  architecture  has  been 
running  small  handcoded  programs  since  .September  1988,  and  has  compiled  code  since 
December.  It  has  been  used  to  guide  the  design  of  the  printed-circuit  Monsoon  board  (part 
of  Project  Dataflow).  We  continued  to  make  progress  on  the  design  and  implementation  of 
the  Monsoon  interconnection  network,  consisting  of  PaRC  switching  chips  and  high  speed 
data  links.  We  have  begun  work  on  the  design  of  an  I-structure  memory  board  for  Monsoon. 

We  have  incorporated  more  optimizations  in  the  Id  compiler,  and  are  moving  its  target  away 
from  the  Tagged  Token  Dataflow  Architecture  to  an  Explicit  Token  Store  model  (ETS), 
of  which  Monsoon  can  be  considered  a  specific  implementation.  We  began  to  look  very 
seriously  at  the  runtime  system  and  the  control  of  parallelism  in  Id  programs  for  better 
resource  management,  and  have  implemented  several  experimental  mechanisms  to  that  end. 

Our  repertoire  of  Id  applications  continues  to  grow  and  includes  DNA  sequence  analysis, 
airport  landing  approach  planning,  computational  fluid  dynamics,  image  processing,  and 
simulated  annealing. 

Our  architecture  research  has  also  moved  further  in  the  direction  of  achieving  a  synthesis 
between  von  Neumann  and  dataflow  ideas.  We  proposed  a  new  architecture  called  P-RISC 
(for  “Parallel  RISC”),  and  have  begun  simulation  and  compilation  studies. 

Based  on  the  I-structure  notation  in  Id,  we  have  designed  a  “functional  database  language,” 
in  which  data  do  not  change — update  transactions  specify  new  versions  of  a  database.  We 
are  implementing  this  database  language,  using  ideas  from  P-RISC  compilation  to  exploit 
parallelism  to  hide  disk  latencies. 

5.2  Personnel 

After  finishing  his  Ph.D.  thesis  in  August  1988,  Greg  Papadopoulos  became  a  member  of 
research  staff,  working  as  the  chief  architect  for  the  Monsoon  prototype  processor  in  Project 
Dataflow. 

.4fter  completing  his  Ph.D.  with  Paul  Iludak  at  Yale,  Jonathan  Young  joined  us  as  a  member 
of  the  research  staff,  working  on  the  compiler  backend  and  runtime  system  for  Monsoon.  His 
research  is  in  compile-time  semantic  analysis  and  optimization  of  functional  programs. 

Paul  Johnson  joined  our  research  staff  and  has  been  working  on  porting  the  existing  Id  World 
to  UNIX  machines. 

•Arthur  Altman  joined  CSC  in  January  1989  as  a  visiting  researcher  from  Texas  Instruments, 
to  study  the  dataflow  approach  to  programming  languages  and  architectures  when  api)lieJ 
to  problems  in  image  undersi  anding. 
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After  finishing  his  Ph.D.  in  May  1988,  Ken  Traub  stayed  on  as  a  research  staff  member. 
In  early  1989,  he  joined  the  Cambridge  Research  Center  of  Motorola,  Inc.,  the  industrial 
partner  on  the  Monsoon  project. 

It  is  with  great  sadness  that  we  record  the  passing  of  Bhaskar  Guha  Roy  on  March  23,  1989. 
He  worked  first  with  Jack  Dennis  and  later  with  Prof.  Nikhil.  He  fought  an  incredibly 
courageous,  year-long  battle  against  liver  cancer,  during  which  he  managed  to  write  his 
Ph.D.  thesis  proposal  and  set  up  his  committee. 


5.3  Programming  Languages 

5.3.1  Id 

In  September,  we  released  the  reference  manual  for  Version  88.1  of  the  Id  programming 
language  i244u  which  augmented  the  language  with  constructs  for  loop  bounding. 

5.3.2  Types  and  Overloading 

During  the  Summer  and  fall  of  1988,  Shail  Aditya  revised  and  upgraded  the  type  checking 
system  of  the  Id  compiler  to  incorporate  changes  from  Id’87  to  Id’88.  This  involved  the 
addition  of  several  key  features  to  type  analysis,  viz.,  algebraic  data  types,  constructor  case 
analysis  and  abstract  data  types.  Further,  the  type  checker  was  made  totally  incremental  at 
the  procedural  level.  Thus,  in  the  version  currently  installed,  the  user  can  compile  individual 
procedures  interactively  from  the  editor,  in  any  order.  The  type  checker,  installed  as  a 
module  in  the  Id  compiler,  incrementally  assembles  enough  information  to  check  the  type 
consistency  of  the  accumulated  program  at  each  interactive  step.  Using  this  information, 
the  runtime  environment  is  able  to  double  check  the  type  consistency  of  aU  the  procedures 
in  the  invocation  graph  just  before  execution.  The  user  is  notified  in  case  of  any  discrepancy 
and  the  appropriate  section  of  the  program  can  be  corrected  and  recompiled. 

During  the  winter  and  spring  of  1989,  Shail  Aditya  worked  on  a  mechanism  for  the  resolution 
and  compilation  of  overloaded  of  operators  and  general  user-defined  identifiers.  The  idea  is 
a  simplification  of  the  system  proposed  by  Wadler  and  Blott  [306]  which  has  been  adopted 
in  Haskell.  Unlike  previous  overloading  schemes,  this  one  is  not  ad  hoc.  It  is  capable 
of  expressing  ‘'recursive  overloading”,  e.g.,  if  “  +  ”  is  already  overloaded  on  integers  and 
floats,  then  it  can  also  be  overloaded  to  mean  addition  of  lists  of  integers  and  floats  and, 
inductively,  on  lists  of  lists  of  integers  and  floats,  etc.  There  is  a  systematic  way  of  resolving 
t Ids  overloading. 

Idle  type  (  hecker  with  overloading  resolution  is  currently  under  test  with  regards  to  effi¬ 
ciency  of  compilation  and  execution.  We  are  conducting  experimental  tests  with  existing 
Id  programs  including  large  scientific  codes  such  as  SIMPLE.  It  will  be  installed  in  the  Id 
comjnlei  in  the  near  future.  1  he  proof  of  consistency  of  the  incremental  type  system  and  the 
details  of  the  overloading  mechanism  are  due  to  appear  in  Shail’s  forthcoming  S.M.  thesis. 
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The  straighttorward  resolution  of  overloading  results  in  some  inefficiency  because  a  procedure 
that  uses  the  symbol  ”  is  implemented  as  one  that  receives  an  addition  function  as  a 
parameter,  which  is  applied  using  a  general  function  call.  It  remains  to  be  seen  how  this 
can  be  optimized  through  a  process  called  “specialization”,  where  separate  versions  of  the 
procedure  are  compiled,  one  for  each  implementation  of  that  is  of  interest. 

5.3.3  Lazy  Evaluation 

Id  has  nonstrict  semantics,  which  means  that  a  procedure  or  data  constructor  application  can 
produce  a  value  before  the  value  of  its  arguments  are  known.  Traditionally,  languages  with 
nonstrict  semantics  have  been  implemented  using  lazy  evaluation,  where  nothing  is  evaluated 
until  it  is  known  that  it  is  needed  for  the  result.  Unfortunately,  when  an  expression  is  needed, 
a  lazy  evaluator  would  have  already  paid  the  overhead  of  building  a  closure  for  the  expression 
and  rescheduling  it.  Further,  it  would  have  lost  the  opportunity  of  evaluating  it  concurrently 
with  other  computations.  For  these  reasons,  we  choose  not  to  use  lazy  evaluation  in  Id. 

However,  lazy  evaluation  can  be  very  useful  for  programming  with  infinite  structures  (e.g., 
streams),  and  for  large  data  structures  of  which  only  a  small  part  is  actually  used.  Steve 
Heller  completed  his  Ph.D.  thesis  in  January  1989,  in  which  he  investigated  the  design, 
use  and  implementation  of  explicitly  designated  lazy  data  structures  in  Id  [158].  Heller 
and  Jamey  Hicks  implemented  lazy  data  structures  in  the  graph  interpreter  (Gita)  based 
on  some  preliminary  work  of  UROP  student  Chuck  Fabian.  He  was  able  to  show  that  of 
the  numerous  examples  of  applications  that  used  lazy  evaluation  in  the  literature,  most  of 
them  needed  only  iionstrictness,  not  laziness.  The  few  instances  where  laziness  was  actually 
necessary  were  easy  to  identify,  and  it  was  quite  easy  to  use  the  explicit  lazy  data  structures 
in  Id.  Jonathan  Young  and  Hicks  implemented  a  restricted  version  of  lazy  data  structures 
on  Monsoon  (four  states  instead  of  five  states  in  the  state  diagram,  since  Monsoon  only  has 
two  status  bits).  Lazy  data  structures  are  being  used  to  implement  global  constants  and  for 
stream  programming,  and  have  also  been  used  in  system  code  for  memory  allocation. 

5.3.4  Managers 

Paul  Barth  continued  his  research  on  managers,  a  construct  for  supporting  nondeterministic 
computation  in  Id.  Nondeterministic  constructs  are  needed  for  state-sensitive  computation, 
including  “application"  programs,  such  as  real  time  systems  and  database  systems  that 
respond  to  multiple  inputs  according  to  their  temporal  order.  They  are  also  necessary 
for  “systems''  programs,  .<^uch  as  runtime  support  for  the  implementation  of  a  functional 
language,  which  need  to  manipulate  the  state  of  the  machine. 

The  manager  con.^truct  was  redesigned  to  facilitate  programming  abstraction  and  efficient 
ini{)lem<uitation.  Rather  than  stream  functions,  managers  have  been  recast  as  abstract  data 
types,  svith  operators  that  access  and  update  a  sl.ared  state.  Fliis  is  beneficial  from  tw- 
standpoints.  As  a  programming  construct,  this  makes  the  nondeterminism  explicit  while 
encapsulating  the  state  transformation.  Fach  potentially  nondeterministic  operator  is  easily 
identified,  and  can  be  written  as  a  function  from  old  state  to  new  state. 
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Managers  are  similar  to  monitors,  but  allow  much  more  flexibility  in  scheduling  the  queues  of 
waiting  processes,  and  allow  much  more  concurrency  between  state- manipulating  procedures. 

From  an  efficiency  point  of  view,  the  new  paradigm  allows  mutual  exclusion  to  be  provided 
by  hardware  primitives  rath-^r  than  stream  operations.  These  primitives,  called  locks,  are 
an  extension  of  I-structure  operations  that  provide  efficient  mutual  exclusion  on  individual 
memory  cells.  The  design  of  locks  (developed  jointly  by  Barth,  Soley,  and  Steele)  is  currently 
being  filed  for  patent.  The  new  manager  construct  is  fully  described  in  CSG  Memo  294. 

Managers  were  incorporated  into  the  compiler,  and  applications  were  developed,  including 
the  dining  philosophers  problem,  a  shared  bank  account  (with  deferred  debits),  a  printer 
scheduler,  a  buddy  system  memory  allocator,  and  a  union-find  set  algorithm.  These  exam¬ 
ples  indicated  that  the  new  design  was  more  perspicuous  and  efficient  than  stream-based 
managers. 

5.3.5  Other  Language-related  Work 
Sequential  Implementations  of  Nonstrictness 

Ken  Traub’s  work  on  sequential  implementation  of  nonstrict  programming  languages  has 
continued,  resulting  in  a  paper  presented  at  the  Aspenas  Workshop  on  the  Implementation 
of  Lazy  Functional  Languages  in  Gbteborg,  Sweden.  The  paper  is  also  to  be  presented  at 
the  1989  Conference  on  Functional  Programming  Languages  and  Computer  Architecture  in 
London. 


Symmetric  Lisp 

Suresh  Jagannathan  completed  his  Ph.D.  thesis  [169]  on  Symmetric  Lisp,  a  novel  parallel 
programming  language  in  which  naming  environments  (called  maps)  are  first  class  objects. 
Through  numerous  programming  examples,  he  was  able  to  show  that  many  diverse  pro¬ 
gramming  paradigms  and  constructs  from  othe-  languages  can  be  expressed  quite  elegantly 
with  just  th'*  map  construct.  Examples  include  records,  LET  and  LETREC  blocks,  “object- 
oriented”  programs,  file  systems  and  directories,  etc. 

Using  a  single  construct  (the  map),  both  as  a  data  structure  as  well  as  a  control  structure, 
raises  some  interesting  questions  about  formal  properties  of  programs,  because  names  are 
used  both  as  program  variables  and  as  field  selectors.  For  example,  in  the  expression: 

(with  M  o) 

a  free  name  x  in  «  is  looked  up  in  M,  if  M  is  a  map  with  a  field  x;  otherwise,  it  is  looked  up  in  the 
surrounding  lexical  environment.  Jagannathan  developed  an  inference  algorithm  to  produce 
statically  a  conservative  approximation  that  predicted  eh  environment  a  name  would  be 
hroked  up  in  A  compiler  could  use  this  information  lor  efficient  compiling  name  lookup 
efficiently.  He  also  showed  an  implementation  of  Symmetric  Lisp  in  terms  of  a  translation 
to  dataflow  gra[)hs. 
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Optimal  Interpreters  for  Lambda  Calculus 


Vinod  Kathail  continued  his  investigation  of  optimal  interpreters  for  the  A-calculus  and 
functional  languages  based  on  the  A-calculus.  The  work  in  the  last  year  focused  on  two 
aspects  of  the  interpreter  we  had  developed:  formally  proving  its  correctness  and  optimality, 
and  simplifying  its  exposition.  To  relate  oui  Interpreter  to  the  A-calculus,  we  developed  a 
new  term  calculus,  which  captures  some  of  the  essential  features  of  the  way  the  substitution 
operation  of  the  A-calculus  is  implemented  in  our  interpreter.  The  term  calculus  is  used  as 
an  intermediate  step  in  proving  the  correctness  of  our  interpreter;  however,  it  may  be  of 
interest  in  its  owui  right.  We  are  in  the  process  of  completing  the  formal  proofs  [175]. 


PGL,  A  Signal  Processing  Language 


Janice  Onanian  completed  a  Master’s  thesis  in  spring  1989,  in  which  she  developed  a  high 
level,  signal  processing  language,  called  PGL,  and  a  program  graph  representation  for  coarse- 
grain  multiprocessors.  Effective  use  of  parallel  processors  requires  dividing  an  application 
into  concurrently  executable  tasks  and  assigning  those  tasks  to  processors  such  that  their 
use  of  the  network  resources  is  optimized.  We  plan  to  use  the  language  and  graph  devel 
oped  in  the  thesis  to  find  an  optimal  partitioning  of  an  application  into  parallel  tasks  for  a 
given  hardware  configuration.  This  involves  two  efforts:  the  development  of  algorithms  for 
evaluating  a  task  partition  dt.’.oted  by  the  program  graph;  and  finding  the  optimal  partition 
by  varying  the  parameters  to  the  program  graph.  Implementation  of  the  PGL  compiler  is 
targeted  for  summer  1989;  and  development  of  the  evaluation  and  optimization  algorithms 
is  planned  to  form  the  basis  for  subsequent,  doctoral  research. 


Haskell,  A  New  Functional  Programming  Language 


.Arvind  and  Nikhil  have  continued  to  participate  in  the  design  of  the  new  functional  program¬ 
ming  language,  Haskell.  As  reported  last  year,  Haskell  is  being  designed  by  a  group  of  about 
20  functional  programming  researchers  from  three  continents.  A  draft  of  the  language  report 
was  released  to  the  public  for  comments  in  December  1988,  which  was  followed  by  extensive 
discussion  on  the  FP  (functional  programming)  mailing  list.  The  Haskell  committee  then 
met  again  in  Mystic,  CT  in  May  1989,  where  we  charted  the  design  decisions  and  actions  to 
be  taken  before  the  final  report  is  released  in  July  1989. 


5.4  Id  World:  The  Id  Programming  Environment 

During  the  fall,  R.  Paul  Johnson  implemented  a  suite  of  interface  functions  designed  b\ 
Richard  Soley  for  Gita,  the  graph  interpreter.  This  suite  of  functions,  known  as  the  Id 
W^orld  Interface  (IWI)  will  support  a  variety  of  Id  World  user  interfaces.  Id  W’orld  Version 
4.0  and  earlier  only  provided  a  Lisp  Machine-specific  graphical  interface.  Id  W’orid  Version  4. ' 
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includes  a  portable  Common  Lisp-based  command  listener.  An  X  Window-based  interface 
is  under  development.  With  the  assistance  of  Jamey  Hicks,  Johnson  released  version  4.0 
for  internal  testing  in  early  December.  Version  4.0,  with  support  for  Symbolics  Genera 
7. 1/7. 2  and  TI  Explorer  3. 2/4.1,  was  shipped  in  January.  Highlights  of  Version  4.0  include 
an  optimizing  compiler  for  Id  88.1,  Id  Mode  Zmacs  editor  support,  and  the  Gita  graph 
interpreter  with  support  for  top  level  constants.  Version  4.1,  which  adds  support  for  Lucid 
Common  Lisp  Version  3  on  Sun  Workstations,  was  released  externally  for  beta  test  in  late 
March. 

The  next  version  of  Id  World  will  have  greater  separation  between  modules  than  in  the 
current  version,  so  that  each  piece  may  be  run  separately  in  a  UNIX  environment  as  opposed 
to  being  tied  to  the  Lisp  Machine  implementation.  In  addition,  Hicks  has  been  meticulously 
documenting  the  internals  of  the  runtime  managers  and  the  compiler  schemata  used  in  the 
current  system,  as  well  as  some  of  the  desirable  hacks  on  the  new  hardware. 

5.5  Project  Dataflow:  The  Monsoon  Prototype  System 

5.5.1  The  Monsoon  Processing  Element 

A  very  exciting  milestone  was  met  in  September  of  1988  when  a  single  processor  Monsoon 
prototype  was  made  operational,  able  to  execute  incrementally  compiled  Id88  programs.  The 
prototype  implementation  was  engineered  by  Jack  Costanza  and  Ralph  Tiberio  in  compliance 
with  the  Monsoon  microarchitecture  specification  developed  by  Greg  Papadopoulos  [253]. 

The  Monsoon  prototype  is  a  64-bit,  fully  pipelined  (eight  stages)  dataflow  processor.  Con¬ 
structed  from  off-the-shelf  components  on  a  single  large  wire  wrap  panel  (9U  x  600mm),  the 
processor  processes  a  modest  four  million  tokens  per  second  or  approximately  three  dataflow 
MIPS  of  which  any  proportion  can  be  double  precision  floating  point.  The  processor  board 
is  enclosed  in  a  custom  cabinet  with  suitable  power  supply  and  cooling,  and  then  connected 
via  ribbon  cables  to  a  simple  NuBus  interface  card  hosted  in  a  Texas  Instruments  Explorer 
Lisp  Machine. 

Hardware  verification  and  debugging  was  facilitated  by  two  design  disciplines.  First,  we 
performed  thorough  timing  simulations  of  entire  board  on  our  Mentor  design  tools.  Dur¬ 
ing  .simulation  we  executed  small  dataflow  graphs  to  verify  overall  operation  and  focused 
specifically  on  various  matching  operations  token  enqueuing  sequences.  The  second  design 
discipline  was  to  employ  scan  paths  for  (almost)  all  internal  state.  In  scan  path  design,  each 
parallel  register  can  have  its  contents  read  and  written  through  a  special  serial  path,  and 
multiples  of  such  registers  have  their  serial  paths  concatenated  and  then  looped  back  to  form 
a  large  scan  ring.  Any  bit  of  processor  state  can  be  accessed  by  shifting  these  serial  registers. 
Finally,  the  scan  rings  can  be  read  and  written  through  NuBus  operations  performed  by  the 
host  Lisp  Machine. 

The  prototype  processor  comprises  over  800  bits  of  scannable  state.  Software  on  the  host  Lisp 
Machine  interprets  and  displays  the  processor  state  in  a  full  screen  format,  with  appropriate 
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data  conversions  (e.g.,  floating  point)  and  mnemonics  (e.g.,  opcodes,  field  decodings).  The 
prototype  processor  clock  t  an  also  be  single  stepped  under  host  control,  and  by  repeatedly 
stepping  the  clock  and  scanning  state  a  full  suite  of  software  breakpoint  conditions  can  be 
established.  In  essence,  we  used  the  combination  of  scan  path  design  and  host  software 
to  develop  a  sophisticated  in  system  logic  analyzer.  We  found  this  to  be  a  very  effective 
debugging  technicjue. 

The  Monsoon  prototype  forms  the  basis  for  the  production  Monsoon  processor,  a  printed  cir¬ 
cuit  board  version  to  be  manufactured  by  Motorola.  Several  improvements  are  incorporated 
in  the  production  version. 


•  A  network  port  based  on  the  PaRC  and  link  chips  is  added  to  permit  the  construction 
of  multiple  processor  systems. 

•  A  set  of  exception  mechanisms  and  more  complete  support  for  system  programs  (e.g., 
loader,  garbage  collector)  have  been  designed. 

•  The  host  interface  has  been  changed  from  NuBus  to  VME  and  a  high  bandwidth  DMA 
path  has  been  added  from  the  host  into  Monsoon  frame  store. 

•  The  instruction  format  has  been  changed  slightly  to  permit  a  wider  opcode  field  (from 
10  bits  present  to  12  bits)  and  variant  formats  are  introduced  that  allow  either  two 
explicit  destinations  or  a  large  absolute  address  displacement  (20  bits). 

•  Much  of  the  datapath  has  been  byte  sliced  into  10,000  gate  CMOS  arrays  (eight  iden¬ 
tical  slices)  and  the  specialized  ALU  functions  that  manipulates  tags  (the  Pointer 
Increment  Unit)  has  been  cast  by  George  Wang  into  a  similar  sized  array. 

•  The  pipeline  rate  has  increased  to  ten  million  tokens  per  second,  approximately  seven 
million  dataflow  instructions  per  second. 

•  The  boarci  size  has  been  reduced  from  9U  x  600mm  to  9U  x  400mm  (“Sun  size”) 
through  the  use  of  gate  arrays  and  surface  mount  assembly. 


The  production  processor  is  in  the  final  detailed  design  and  simulation  phase.  We  expect  to 
hand  off  the  design  to  Motorola  by  June  1989. 

5.5.2  The  Interconnection  Network  for  Monsoon 

Andy  Boughton.  Chris  Joerg,  and  John  Santoro  continued  their  work  on  the  network  for 
Monsoon.  We  have  continued  to  develop  the  two  chips  that  will  be  used  in  the  network,  the 
Packet  Routing  Chip  (PaR(')  and  the  Data  Link  Chip  (DLC).  PaRC  is  a  four  input  four 
output  packet  router  on  a  chip  and  is  the  primary  component  of  the  Monsoon  network.  DLC 
contains  a  data  link  transmitter  and  a  data  link  receiver.  The  transmitter  will  allow  a  PaRC 
output  port  to  be  connected  to  an  interboard  cable  and  the  receiver  will  allow  an  interboarc! 
cable  to  be  connected  to  a  PaRC  input  port. 
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Joerg  has  continued  the  development  of  PaRC;  the  design  has  not  changed  significantly  over 
the  past  year.  Some  work  has  been  done  to  enhance  the  statistic  collection  abilities.  Also, 
some  improvements  were  made  to  the  control  port  of  PaRC.  The  control  port  is  the  section 
that  allows  a  local  controller  to  control  several  parameters  of  the  chip’s  operation  (such  as 
how  to  do  routing  and  what  to  do  when  errors  are  seen).  Most  of  the  work  done  on  PaRC 
has  involved  creating  test  vectors.  These  vectors  will  be  used  to  ensure  that  fabricated  chips 
do  not  contain  any  functional  defe-  :s. 

Santoro  continued  the  development  of  DLC.  During  the  past  year,  we  have  used  the  prelim¬ 
inary  logic  design  completed  last  year  to  develop  a  detailed  design  for  DLC  in  Motorola’s 
Mosaic  II  ECL  gate  array  technology. 

The  top  level  design  of  DLC  has  changed  somewhat  during  the  year.  The  primary  change 
was  the  elimination  of  4  into  6  encoding.  Our  original  design  called  for  the  encoding  of  aJl 
data  transmitted  over  interboard  cables.  The  primary  advantage  of  this  encoding  was  the 
elimination  of  the  DC  component  of  the  transmitted  signal.  However,  encoding  required 
that  the  DLC  be  designed  to  operate  on  a  50%  faster  clock.  Designing  DLC  for  such  a  clock 
turned  out  to  be  a  fairly  difficult  task.  Faced  with  this  difficulty,  we  ran  a  large  number 
of  tests  on  our  proposed  drivers,  receivers,  and  cable  to  determine  whether  data  could  be 
reliably  transmitted  without  encoding.  Our  tests  indicated  that  a  data  pattern  containing 
an  arbitrarily  long  sequence  of  O’s  followed  by  a  1  and  another  long  sequence  of  O’s  could 
be  transmitted  over  the  cable  with  more  than  sufficient  noise  immunity.  Our  tests  indicated 
that  the  inverse  pattern  also  worked.  Based  on  these  tests  we  elected  to  simplify  design  of 
the  DLC  by  removing  encoding. 

The  detailed  design  of  DLC  has  been  completed  and  simulated.  Test  vectors  have  been 
written  which  are  sufficient  for  testing  fabricated  chips  for  faults.  A  preliminary  version  of 
the  design  has  been  transferred  to  Motorola.  The  final  version  of  the  design  should  be  given 
to  Motorola  before  June  30,  1989. 

5.5.3  The  I-structure  Memory  Board 

During  the  spring,  Ken  Steele  began  work  on  a  hardware  I-structure  controller  design  that 
wilt  implement  1-structures  and  the  new  memory  operations  developed  for  the  Monsoon 
prototype.  Each  board  is  expected  to  provide  4MW  (64  bits/word)  per  board,  and  be 
capable  of  handling  up  to  five  million  requests  per  second  through  an  onboard  PaRC  chip. 

5.5.4  MINT:  a  Monsoon  Simulator 

Andy  Shaw  and  Jonathan  Young  implemented  a  simulator  for  the  Monsoon  architecture 
which  proved  to  be  an  invaluable  tool  for  debugging  the  hardware.  For  his  S.B.  thesis, 
Shaw  then  extended  this  project  into  a  complete  interpreter  that  is  capable  of  mimicking 
the  hardware  with  great  precision.  The  intent  is  that  any  object  code  that  runs  on  Monsoon 
will  run  without  modification  on  MINT.  The  design  is  very  modular,  and  uses  the  Monsoon 
microcod.''  compiler  described  below. 
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Siiuc  w<‘  wish  to  II  ('<  II  nit  cl  V  Miiiiil.ilc  I  lie  priH  cssor,  rc^iirdlcss  ol  its  cinriMil  inicroi  oilc,  ;i 
III H  I'oi  ( i(|c  to  (  'iiiiiiiioii  his[)  ( oiiipilcr  V  is  <lcsi)j,ii<  (l  ami  coded  hy  l)erek  (  diioii.  l  lie  compilei 
ai'cepts  Moll,-,!  i<  III  iiiKioiiide  aiid  tiaii.  ales  it.  into  ('oiiiiiioii  liisp  com  pa  ra  Ide  t.o  liainl  code 
111  ellicieiicv.  Secondary  l  uiisideral.ioiis  were  liiiiiian  readaldlil.y  and  <  ode  size.  'I’lim.,  tin' 
identical  microcode  speci  lie.i  I  ion  used  to  drive  the  actual  liardware  is  also  com|)iled  lor  the 
simulator,  with  ohvioiis  henehts  ol  hardw.ire  simulator  consistem  y.  The  coni|)iler  is  liexilde 
enouyli  to  adapt  to  anv  lorea-eabh-  microcode  chan)i,es.  I'lie  compiler  is  written  in  (lommon 
hisp. 

5.6  Iiin>l(‘*noiitatioiis  of  1<I 

5.6.1  I'l  on  Monsoon 

.lonathan  Vonny;  and  .lamey  Micks  sp.eiit  most  ol  their  lime  this  yea.r  porl.iiifr  the  existing' 
Id  lompiler  to  Monsoon  (with  some  initial  work  hy  liradley  1\  iiszmanl ).  I'his  enahles  ns  to 
run  real  proy;raiiis  on  tin'  Monsoon  wire  wrap  |)rotoly|)e.  We  now  have  a  workiiif'  Monsoon 
(iiiiipiler,  as  well  as  a  loadi'r,  a  runtime  system,  an  execution  manager,  and  a  rndiiiientar'. 
/'•hugger;  the  standard  hhrarn's  ha,v<'  also  h<-en  porlod. 

While  mm  h  ol  the  work  of  jiortiug  the  lompiler  was  easy  hecanse  tlie  Monsoon  M  I  S  ai 
chilectiire  strongly  vesemhles  tiie  pvevlons  '1'!  !)A  architect  ure,  tlie  runtime  systimi  reipiired 
ma  jor  work.  The  (dta.  simulator  r<‘lied  on  iln‘  storagi*  management  of  the  Lisp  Machines 
on  Monsoon  we  implemented  haiulcoded  managers  lor  fri-e  lists,  frames  for  i>rocediire  calls, 
and  two  dillerent  In-aps.  V\'e  also  iiiipleiiwntml  managers  lor  I  striM  liiri's  and  semaphore! 
(“locks”)  1.0  tide  us  oviT  until  w<‘  have  a  working  I  structure  im'iiiory  hoard.  In  aihlilion 
special  managers  were  ms-ded  to  support  particular  language  fea.tures  such  as  delays  and 
accumulators. 

I  .1  m.'  I  he  (>:<•,  u  t  n  Ml  u.a  n  ag;er,  the  user  may  now  call  any  Id  fn  net  ion  which  has  heen  com  pi  led 
and  loaded  into  Mon.  oon  with  as  manv  argaiim-nls  as  ih'.sireil.  Mxecul.ion  is  currently  limiteii 
t  o  ei  I  her  "run  u  n  1 1 1  d<  uh-  ’  or  a  general  :.i  iig.h-  '.I  I'liper,  in  which  a  11  eigli  t.  st  ag,es  id  the  .Mon  '  mi 
ploei's  or  pipeline  are  li  ihl's  Alter  .1  prog.ram  error  ha;,  heeil  detected,  varioii:.  tool;,  .lihe.i 
the  11  e  I  to  view  '.V  a  1 1  1 1 1  g,  I  o  k ' '  11 .  d  a  t  a  .1 1  u<  t  u  re.s,  a  nd  i  n  ,t  r  iicl  ion  memory. 

Hailh  ami  A  ou  ii;',  dei  e!.  iped  ,i  graph  hro  ws'-r,  a.mi  I  )oug,  St  et  ,o|i  iin|)ro'.’ed  tin'  display  heiiris 
Ik..  I  he  1)10  w.e  I  proved  I  ,  J  he  vel  V  Use  I  111  lol  dehllggl  Ilg  hot  h  tile  i  oi  11  pi  ler  .Hid  Id  |)ro|'p  a  'IIS  . 
and  it  ha  ,  1  leeii  1 1.  .,1  .i  1 1  ei  I  in  the  iMoii'.oi  lu  s  v-l  eiii .  vAMk'Ii  compiling,,  the  gra  jdi  o|  a  jiroi  ed  ii  i  '■ 

I  .  opt  ion  .1 1 1  V  d  I  pi l '  d  ,  it  c.  po  ,  hie  to  levv  I  lie  g  I  a  |dl  o|  a  proced  11 1  e  wit  h  1 1  ,  w  a  1 1  1  n  ” 

tokens  after  a  partial  i-xei  iilion  on  Mon.'.oon. 

.Storagp-  M  an agei 1 1 en I 

I',  e  n  S  I  /  'el  e  I  (  )t  e  I  1 1  K  I  o  (  t  I  d  /  '  lol  1  h/'  j  >  i  '  »>  >  ■  <  U  p  I  of  /  U  y  pe  to  .  I  I  il  1 1  i  a  I  * '  I  .  t  1  l  K  i  i  I  .  '  <  ll  the  p  ; 

c/'ssor  n  K'l  I ;  o  r .  A  I  ." .  new  i  n  ■  i  l  m  i  ■  ■  'U  v.'e  re  c  rea  I  <‘<1  I  o  -.  ii  p  po  i  I  I  In'  rui  1 1  ii ,  n-  i-r.  i  ron  a  '  ; 
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and  compiler.  These  included  non-busy  waiting  locks  and  support  for  lazy  evaluation.  A 
patent  has  been  applied  for  on  the  non-busy  waiting  locking  mechanism. 

.4  storage  management  system  was  implemented  in  Id  for  dynamic  allocation  and  dealloca¬ 
tion  of  structure  memory  and  frame  memory.  Stephen  Brobst  implemented  a  buddy  system 
algorithm  using  iiondeterministic  lock  and  unlock  primitives,  which  were  provided  as  exten¬ 
sions  of  the  Id  language  as  a  result  of  work  done  by  Barth.  Multiple  instantiations  of  the 
allocation  and  deallocation  routines  can  proceed  in  parallel,  with  suspension  occurring  only 
when  two  allocations  attempt  to  allocate  blocks  of  the  same  size.  Fast  path  execution  of 
the  memory  allocation  routine  requires  less  than  50  RISC-like  instructions  for  its  critical 
path.  Young  ported  the  buddy  system  to  the  Monsoon  Architecture  and  augmented  the 
storage  management  system  with  stack-based  allocation  mechanisms  for  cons  cell  and  fixed- 
size  frame  memory  allocation.  Brobst  has  also  written  an  Id  version  of  the  first-fit  algorithm 
and  is  experimenting  with  various  granularities  for  free  list  management. 

A  first  version  of  the  Id  runtime  system  has  been  specified  and  is  now  under  implementation. 
The  storage  m  luagement  system  will  leverage  the  work  of  Barth,  Brobst,  and  Young  to 
provide  dynamic  allocation  of  structure  storage  and  frames  fnr  large  codeblocks  using  the 
buddy  systerti,  and  in-line  stack  allocation  of  me;  ;ory  for  cons  cells  and  fixed-size  frames. 
The  I/O  subsystem  will  provide  a  primitive  interface  to  the  file  system  using  string  objects 
and  standard  system  call  interfaces  for  file  open,  close,  read,  and  write.  Extensions  to  the 
Id  language  for  synchronizing  multiple  reads  and  writes  to  a  single  file  are  an  active  area  of 
research. 

5.6.3  Long  Term  Software  Structure 

Tlie  above  retargeting  of  Id  for  Monsoon  uses  the  TTDA  code  that  is  produced  from  the 
existing  l)a<'kend  of  the  compiler.  This  is  not  an  attractive  route  in  the  long  term.  Young 
has  written  a  specification  of  the  ETS  abstract  machine  [317]  for  use  in  compiling  to  the 
Mon.'Cjon  a.'cbitert  ure  as  it  slowly  evolves. 

Traiib  ha^  fie,- igueri  the  architecture  of  the  software  system  which  will  support  Monsoon,  to 
be  jointly  implemented  ai  MfiF  and  at  Motorola  Cambridge.  The  greatest  difference  between 
the  new  software  system  and  the  old  TTDA/Gita  system  is  one  of  modularity.  Whereas  the 
function-  oi  loading  Id  programs,  running  them,  debugging  them,  and  displaying  runtime 
statistics  were  previously  all  handled  by  the  Gita  program,  in  the  architecture  each  of  these 
function  -  wdl  't,e  handled  by  separate  programs,  with  a  top  level  program  provided  to  present 
the  user  wit',  e,o-eiif iallv  tlie  same  programming  environment  as  found  in  the  current  Id 
World,  1  ;  c  o'siiltirig  system  will  be  much  more  robust  and  flexible,  and  will  point  the  way 
tor  thf  n;  j,,;  ;  i ;  i„Mui !  i  on  ol  the-e  functions  onto  the  dataflow  processor  itself.  Perhaps  even 
nmre  i : :  p.  u  t  .o,  1 1  -,  ffc-c  urograiiis  are  designed  to  work  both  with  Monsoon  hardware  and  its 
.-'litware  cu  i ;; !  ;i  • ,ri  !  Mib'f  ).  [,ocal  area  networks  are  an  important  part  ot  the  new  system, 
I'ofi;  in  I  Ur  i:  f  of  X  Windows  as  tlie  framework  for  the  user  interface  and  in  providing  a 
U'  ‘  .'.o::;  p.-’u  !.■  'in-  M.iu-ir,!!  j, .a r<l wa rc  or  emulator.  This  will  allow  for  easy  sharing  of 
a  ■  ■■  ■'  .ji  aiuoiiu  -o.wrai  u-.ts.  1  he  soitware  architecture  and  all  the  interfaces 

Ic'i'.w.  !'  u  .pou,;;'  ,,rc  i  11  o  To '  1  g  1  i  1 V  (1  o  cu  III  I '  n  f  pfi  ill  b301],  edited  by  I'raub. 
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Hicks  iuid  Irani)  (Icsiffiicd  llic  Monsoon  ()!)jcc(,  (axle  (M()(!)  lorinat.  'I'liis  is  tlic  loriiiat  in 
vvliicli  the  Id  coiiipih'r  (and  other  |)roj<;raiiis)  will  writ**  obje-ct  fib's.  MOd  is  based  on  (dOHb 
((hanirnon  1  n  pu  t  /  ( )n  t  pn  t  Hase  bang  iiaf.>.e ),  ri-cb-signed  by  'Franb. 

links  had  also  made  an  initial  ilesigii  ol  the  bl  Oliject  Format,  d'ln-  Id  Object  Format 
ili-scnbes  the  data  strin  Inn's  tli.it  will  ix'  lo.aded  lor  an  bl  jirogram.  I'liere  will  be  strin  tnres 
lor  each  procedure,  global  constant,  and  <  ode  block  compiled  .and  loaded,  d  lx'se  strin  t'ires 
will  hold  (ompnted  values  and  program  code,  and  will  sniiport  dynamic  linking,  d'hey  will 
also  have  source  inlormation  for  use  in  debugging.  All  jirograin  information  that  is  nc-eded 
at  riintimc'  will  lie  structured  using  the  Id  Object  F'ormat.  'The  actual  object  files  will  be 
encoded  into  M 0(  b 

5.0,4  Kxperiiiioiits  with  Strueture-stornge  Management 

.lamey  Hicks  c'xtended  the  Id  compib'r  to  handle  data,  strnc'nre  reb'asc'  annotations.  'This 
allows  ns  to  deallocate  data  structures  rc'lativc'ly  painb'ssly,  but  it  is  not  nu'ant  to  be  a 
language  leatnre  that  users  will  I'lnjiloy.  It  is  meant  to  be  an  experinic'ntal  b'atnre,  so  that 
we  can  compare  the  perform.ince  of  hand-annotated  programs  with  that  of  automatically 
a  n  not  at  ed  programs. 

'File  synta,x  of  the  an  iiolal  ion  is; 

(Dreiaase  IDENTIFIER; 


or 


(Brolaas*  IDENTIFIER  () ,  IDENTIFIER!  .  ...  IDENTIFIER^a ; 

inside  a,  block  exprc'ssion.  Fins  anncitatioii  specilic's  the  reb'ase  of  the  structure  bound  to 
IDENTIFIER  .  when  all  computation  enclosed  within  the  block  ex|)ression  has  terininatc'd. 
I  his  only  releases  the  slora;',e  ( orresjionding  to  the  top  level  of  th<‘  structure';  tin'  comiiiler 
<  anriot  (b'termiue  how  much  sharing  of  snbstruc  tnrc's  thc'rc'  ar,.  in  the  program,  so  it  doc's 
not  release  them.  File  compiler  inserts  thi'  synch roni/.at.ion  code'  nc'cc'ssary  to  ('iisiirc'  that 
111'-  ob]<(  f  is  not  released  until  all  of  ttie  code  in  the  block  has  terminated  com|)nl  ation. 

liisi'b'  a  loop,  CDHELEASE  ac  t  nal!^  has  I  wo  meanings:  il  tlu'  st  rnct  urc'  is  not  circulated,  then  it  is 
released  when  the  <  iiriciii  itetation  h.is  terminated,  ot  hc'r  wise',  if  1  he-  str'cctnic'  is  circulated 
III  the  loop,  I  he'll  it  I  (leases  all  but  the  first  and  last  values  of  the  strnctnre  when  the 
I  on  (  spoil 'I  mg  1 1  era  I  ion  ha  s  I  er  mi  iia  I  ed  .  I  he  release  ol  circulating  st  ruct  ii res  is  accom  jdishei  I 
bv  n  II 1 1  il  li  II  i'_  t!ic  loop  once,  and  not  rele.i  ang  the  strnctiire  in  the  initial  execution  of  tin- 

I IC  id  V 
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Her*-  IS  an  example  ot  t  tic  fflRELEASE  aniuit  aiion  in  ttu‘  multiwave  [iroccflnrc; 
y,’/.7,  Rvm  several  iterations  of  the  wavefront. 

y.’/.y.  Illustrates  the  arbitral/  chaining  achievable  in  dataflow, 
y.y.y.  Release  the  intermediate  waves  when  through  with  them, 

def  multiwave  edgevector  n  = 

{m  initial. wave  edge. vector; 
in 

{for  1  <  1  to  n  do 

next  m  .  wave  m; 

Sroloase  m; 
finally  m  }}; 


Young  lia.s  written  a  siinjilc  coiiipile-t.iiuc  analysis  |)rograni  which  dcicrinincs  when  it,  is  safe 
to  (l<‘a!loca.t<-  si  rn<  tnires  in  loops;  deallocation  annotations  arc  then  automatically  arlcled  to 
t  he  program . 

5.0.5  Itesource  Management  in  Scientific  Programs 

DavifI  Culler  made  suhstanlial  progress  this  year  toward  effective  management  of  parallelism 
and  resources  iti  dataflow  [)rograms.  'I'he  problem  is  that  exploiting  parallelism  to  achieve 
high  perl()rma.nct'  invariably  im  reases  the  resource  requirements  of  a  program.  'Phis  phe¬ 
nomenon  is  not,  particular  to  dataflow,  it  can  be  ob.served  to  some  degree  in  any  form  of 
parallel  execution.  llow<‘ver,  it  is  particularly  .serious  under  dynamic  dataflow  execution, 
because  all  the  potential  parallelism  in  a  program  is  exposed.  This  means  that  ample  paral¬ 
lelism  i.s  available  on  a  broad  cla  :s  of  programs  but,  unfortunately,  the  resource  reipiirements 
of  many  programs  are  excessive,  often  leading  to  deadlock.  Culler  documented  both  sides 
of  this  dihmima  \ising  parallelism  and  resource  profiles  of  a  variety  of  scientific  programs 
derived  under  an  ideal  dataflow  <-xecution  model  (supported  by  Cita). 

In  Ids'),  he  devtdoped  a  mechanism  for  controlling  parallelism,  called  A;-bounded  loops.  Basi- 
<  ally,  loops  are  coni|)iled  into  dataflow  graphs  in  a  iminner  that  allows  the  maximum  numt)er 
o|  concuireut  iterations  to  be  set  dynamically,  when  the  loop  is  invoked.  'Phis  approach  is 
.ip|)ealing  for  scientific  programs,  which  are  dominated  by  iterative  computations  over  large, 
regular  data  strin  tures.  It  has  played  a,  central  role  in  the  evolution  of  tagged  token  dataflow 
ar<  hite(  I  iires  tevvard  I'lxplicit  'i'l'ken  Store  mar  hines  and  hybrid  machiiu's  l)ecaus<‘  it  allows 
the  tag  ^pacr  to  be  Used  densely.  Also,  if  provid«‘s  a  natural  means  ol  reusing  resources 
within  iterative  (  om  j)u  I  at  ions.  I’he  ipiestion  he  has  been  <‘X|)Ioring  recently  is  how  to  assign 
the  /,•  bi  I  u  in  I  ■  a  11 1  o  1 1 1  a  1 1<  al  1  V- 

I  he  appieiarh  (  'ulier  has  taken  is  to  rely  heavily  r)n  stati<'  analysis  to  characterize  the  dy 
naiiiic  befiavier  (d  prograims  I  here  are  two  aspects  of  this  analysis:  worst  cas<'  resource 
reipii  reiiieii ! and  expelled  parallelism.  .A  rejiresentation  f)f  the  dynamic  c:*.!!  structure  of 
the  jiiogram  is  eonstructed  and  aiinotatjul  with  syndiolic  rcsouTXT  fxprr.Hsioiis  which  are 
parametiic  m  I  he  /,■  bound,,  and  in  certain  |>rogram  variables.  In  addition,  loops  are  classi¬ 
fied  as  h.iviiig  limited  ii  .eful  unfolding,  expensive  unfolding,  and  efficient  unfolding.  Based 
on  this  analvsi  ,  the  |)iogram  is  augim-nted  with  resource  management  corle  tfiat  computes 
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till*  /i  l)<)iiri(ls  by  siiii|)l'‘  forniubu'  derived  Iroiii  tlie  resource  <‘X|)ressioiis  that  capture  a  liiy,li 
level  policy,  e.jf.,  favor  the  middle  level  in  this  triply  nested  looj).  A  variety  of  policies  have 
l>een  examined  analytically  and  <-m|)irica.lly,  and  a  partiodar  policy  has  been  effective  in  (on 
taining  the  resource  recpiiremeiits  of  scientific  rlataflow  programs,  while  exposing  ade<piat( 
par.dlelism. 

1  his  work  forms  the  beginning  of  a  bridge  between  our  research  in  rlataflow  execution  and  the 
work  in  paralhd  execution  of  I’OH  I'HAN.  In  our  case,  the  problem  is  to  constrain  potential 
[)arallelism  that  is  not.  cost  etb-clive  to  exploit.  In  the  FOimiAN  case,  the  prr)bl<-m  is  to 
determine  where  it  is  most  cost  effectiv<‘  to  uncover  parallelism.  W<‘  will  n<‘ver  r<-ach  exa<  tly 
the  same  pla.ce,  because  our  analysis  must  err  in  t  h<‘  <lir<-ctioii  assuming  two  com|)utations 
cannot  be  serialized,  while  t  hen  ’s  must  err  in  the  direction  of  assuming  two  conijiutat  ions 
caini'tt  execute  in  parallel.  .Still,  w<‘ expect  ther<*  will  be  a  valuable  cross  fertilization. 

5.6.0  Speculat;ve  Purnllelism 

diehard  Soley  completed  his  I’li.I).  thesis  work  this  year  on  the  control  of  speculative  paral 
hdism  in  Id  programs,  under  the  abstract  taggerl  token  ilataflow  execution  model.  Although 
resource  control  models  for  exi)loiting  th<‘  parallelism  in  large  scientific  codes  have  been 
r<‘cently  explored,  no  approach  to  exploiting  speculative,  searching  parallelism  has  Ix'eii  <‘x 
plored,  even  thoiigli  (or  pei  naps  Ix'caiise)  the  potential  parallelism  of  such  applications  is 
tremendous.  .Soley  <‘X|)lores  a  view  of  speculation  as  a  {)rotess  which  may  proc**ed  in  parallel 
in  a  controlled  fashion,  using  <'xamples  from  actual  symbolic  jirocessing  situations. 

riie  central  issue  (j|  exploiting  this  parallelism  is  the  dynamic  containment  of  th<‘  resources 
ne(.i’s.sary  t<j  execute  larg<' speculati v<‘ r'oih’s.  Soley  shows  eflici<“nt  strm  tures  (gra|)h  schemata 
and  architectural  supiiort)  for  executing  highly  sjieciilati ve  programs  (such  as  expert  sys 
terns)  under  a  dataflow  execution  paradigm.  In  ord«‘r  to  control  dynamic  execution  graph 
growth,  Soley  develops  controls  <»v<'r  cross  procedure  jiaralUdism  in  an  r-xtensible  manner, 
with  fipplicatioii.-;  to  the  various  current  [»roblems  of  dataflow  computation.  Approaches  fo 
scheduling,  prioritization,  and  s<‘arch  tree  pruning  were  consich-red,  evaluat('d,  and  compared. 

In  his  thesis,  Soley’s  work  fleshes  out  tfi<’  d<‘tails  of  primitive  execution  resource  mamige 
ment  (film  tioii  application  ;ind  memory  allocation),  giving  implcuncmtations  for  gcmc-ral  and 
primitive  resource  tmmagers  and  other  nond«‘termiiiistic  constructs  at  the  Id  language  level. 
Ifynamic  binding  oi  managers  is  also  presented  to  give  a  menning  to  the'  term  “task;”  Solc-y’s 
work  su|)|)ort.s  the  prioritization  anci  termination  of  dynamically  defin<-d  tasks. 

'!  he  underlying  constructs  u  .eel  by  Soley's  speculation  control  features  rely  on  an  c'xtmded 
definition  of  1  structure  slor.ig/-.  'I'his  m-w  dc-finilion  adds  an  uncoiit.rollecl  I  structure  WRITE 
(as  ojiposed  til  store)  instruction,  which  overwrite's  I  structure'  cell  coiit.e'iits.  'I'his  nondi'te'r 
ministic  feature  is  useful  lor  imph'iiienting  higher  level  control  constructs,  as  Soley  shows. 

More-  re'voliitionary,  howeve-r.  is  the-  nc'w  ce'll  locking  paradigm  elevcle)|)e'el  by  Soh-y,  Ste'e-h', 
and  Harth.  '!  he-  new  se  h;  iin'  i'  detaile-d  in  Sole-y’s  Pfi.D.  thesis,  Ste'C'le’s  upe'oming  Masle-;' 
Ihi'sis,  and  Harth  and  Nikhil  -  report  i.'Ml.  I’ll  e-  iie'w  loc  king  structure'  of  I  structure'  mi'inoi  \ 
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already  implemented  in  the  Gita  simulator  (by  Soley  and  Barth)  and  the  Monsoon  prototype 
(by  Steele),  relies  on  the  existence  of  structure  presence  bits  and  deferral  lists  to  allow  critical 
section  coding  of  resource  managers  and  the  like.  In  addition  to  supporting  busy-waiting-free 
lock  primitives,  these  “dataphores”  also  allow  the  storage  of  data  in  the  semaphore  cell  itself 
(hence  the  new  name).  The  basic  contract  of  the  locking  instructions^  are  the  following; 


•  READ-AND-LOCK  (cell):  returns  only  when  the  cell  has  been  locked,  with  the  value 
written  to  the  cell  when  it  was  allocated  or  last  unlocked. 

•  WRITE-.AND-UNLOCK  (cell,  value):  unlocks  the  cell  specified,  writing  the  given  value 
into  the  cell. 


Recognizing  that  these  instructions  also  support  a  primitive  queueing  mechanism  (albeit  of 
nondeterministic  queue  order),  several  other  uses  for  this  new  feature  have  been  found.  MIT 
is  pursuing  a  patent  on  this  extension  to  dataflow  (and  other  message  passing)  architectures. 

5.6.7  Garbage  Collection  for  Id  on  Monsoon 

.4run  Iyengar  has  begun  looking  at  garbage  collection  on  dataflow  multiprocessors.  We  are 
implementing  a  copying  garbage  co*  or  for  Monsoon.  Simultaneously,  Young  is  looking 
at  compiie-time  techniques  for  detecting  when  heap  objects  are  no  longer  needed.  We  plan 
to  quantitatively  study  the  amount  of  storage  which  can  be  reclaimed  by  garbage  collection 
and  static  program  analysis.  We  are  also  interested  in  the  increased  execution  time  and 
additional  support  required  by  these  two  different  approaches  for  reclaiming  heap  storage. 

5.6.8  Parallel  I/O 

Bhaskar  Guha  Roy  worked  on  the  design  of  a  parallel  I/O  system  for  a  dataflow  machine. 
In  addition  to  processing  elements  and  I-structure  memories,  he  proposed  that  disk  units  be 
attach'“d  to  the  interconnection  network.  Processors  would  interact  with  the  disk  units  using 
split -j)hase  transaction  in  a  manner  similar  to  I-structures.  To  initiate  a  disk  transfer,  the 
processor  sends  a  token  to  a  disk  unit,  specifying  the  direction  of  transfer  (read/write),  the 
address  of  the  disk  block,  the  address  of  an  I-struciure  for  the  data,  and  the  continuation  of  a 
thread  that  awaits  the  completion  of  the  transfer.  The  objective  is  to  tolerate  disk  latencies 
in  exactly  the  same  way  that  the  latency  of  I-structure  accesses  is  currently  tolerated  by  the 
procf'ssnr.  Guha  Roy  designed  language  constructs  to  rtrpress  parallel  I/O  in  the  presence 
of  nonstrict  data  structures,  and  designed  compilation  techniques  for  them. 

‘Variously  (■alin’d  iork/iinlock,  read-and-lock/write-and-unlock  and  take/put;  here  we  shall  use  the  most 

verbose  forms. 
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5.6.9  Other  Monsocn-related  Work. 

Ken  Steele  and  Richard  Soley  proposed  a  design  for  integrating  virtual  memory  address 
translation  into  the  dataflow  model  [294]. 

Lina  Muryanto  and  Peter  Tan  wrote  a  compiler  that  takes  ETS  code  from  the  Id  compiler 
and  produces  MC68020  code,  so  that  Id  programs  may  be  run  on  Sun  workstations.  It  uses  a 
MIPS-like  RISC  language  as  an  intermediate  form,  to  facilitate  porting  it  to  other  machines. 
So  far,  the  compiler  only  accepts  a  small  subset  of  the  full  language,  and  much  work  remains 
to  be  done  in  optimization. 


5.7  Applications 


We  are  happy  to  report  an  increase  in  the  number  of  large  application  programs  being  written 
in  Id. 

5.7.1  Simulated  Annealing 

Stephen  Brobst  and  Phil  Kuhn  implemented  a  number  of  different  algorithms  for  simulated 
anil  waling.  Simulated  annealing  is  a  heuristic  that  is  commonly  applied  to  a  large  class  of 
opt  mization  problems  that  are  known  to  be  NP-complete,  such  as  scheduling  and  build¬ 
ing  layout.  They  found  that  although  the  purely  functional  subset  of  Id  did  not  lend  itself 
web  to  an  efficient  implementation,  accumulators  provided  an  elegant  paradigm  for  handling 
the  nondeterministic  aspects  of  the  algorithm  without  sacrificing  overall  determinacy  in  the 
program.  They  also  made  use  of  Barth’s  lock  and  unlock  primitives  along  with  structure 
overwrites  to  implement  a  purely  nondeterministic,  nonfunctional  version  of  the  program. 
Th-  ability  to  overwrite  structure  elements  without  copying  the  full  structure  provided  a 
lar,!;e  reduction  in  the  number  of  instructions  during  program  execution.  However,  the  syn- 
chi.^nization  required  for  correct  implementation  of  the  algorithm  in  the  presence  of  structure 
ov.  rwrites,  actually  increased  the  critical  path  length  of  the  program.  Moreover,  debugging 
an  .  program  design  in  the  presence  of  locking  and  structure  overwrite  primitives  became  sub- 
st:  .itially  more  difficult.  Issues  of  deadlock,  nondeterminism,  read-write  races,  etc.  which 
we  e  previously  not  present  in  the  deterministic  implementations  became  major  stumbling 
bl«  .;ks  in  the  parallel  execution  environment. 

5.7.2  DNA  Sequence  Algorithms 

DNA  sequence  data  is  accumulating  very  rapidly.  If  the  genetic  sequence  of  the  entire  human 
genome  is  determined,  databases  will  grow  by  two  to  three  orders  of  magnitude  from  their 
current  sizes.  Parallel  processing  is  becoming  increasingly  important  as  biological  sequence 
data  increases.  .'\run  Iyengar  implemented  several  different  algorithms  for  comparing  se 
quences  using  Id.  Implicit  parallelism  makes  Id  a  very  easy  language  to  use.  One  drawback 
is  the  extra  copying  required  when  an  aggregate  data  structure  needs  to  be  updated. 
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5.7.3  Flight  Path  Generation 

For  his  Ph.D.  thesis,  Michel  Sadoune  of  the  Department  of  Aeronautics  and  Astronautics 
has  implemented  a  Terminal  Area  Trajectory  Planning  System  for  air  traffic  control. 

A  Flight  Path  Generator  is  defined  as  the  module  of  an  automated  air  traffic  control  system 
which  plans  aircraft  trajectories  in  the  terminal  area  with  respect  to  operational  constraints. 
The  flight  path  plans  have  to  be  feasible  and  must  not  violate  separation  criteria. 

The  problem  of  terminal  area  trajectory  planning  is  structured  by  putting  the  emphasis  on 
knowledge  representation  and  air-space  organization.  A  well  defined  and  expressive  semantics 
relyin^  on  the  use  of  flexible  patterns  is  designed  to  represent  aircraft  motion  and  flight 
paths.  These  patterns  are  defined  so  as  to  minimize  the  need  for  replanning  and  to  smoothly 
accommodate  operational  deviations. 

Flight  paths  are  specified  by  an  accumulation  of  constraints.  A  parallel,  asynchronous  im¬ 
plementation  of  a  computational  model,  based  on  the  propagation  of  constraints,  provides 
mechanisms  to  efficiently  build  feasible  flight  path  plans.  A  network  of  constraints  is  imple¬ 
mented  as  the  superposition  of  dataflow  graphs  which  are  synchronized  distributively. 

A  methodology  for  a  fast  and  robust  conflict  detection  between  flight  path  plans  is  intro¬ 
duced.  It  is  based  on  a  cascaded  filtering  of  the  stream  of  feasible  flight  paths  and  combines 
the  benefits  of  a  symbolic  representation  and  of  numerical  computation  with  a  high  degree 
of  parallelism. 

The  Flight  Path  Generator  is  designed  with  the  goal  of  implementing  a  portable  and  evolving 
tool  which  could  be  inserted  in  controllers’  routine  with  minimum  disruption  of  present 
procedures. 

Flight  path  generation  and  conflict  detection  have  been  implemented  in  Id.  The  program 
which  is  run  with  various  machine  configuration  is  composed  of  600  procedures  for  a  size  of 
5000  lines  of  Id  code.  It  is  used  as  a  test  program  for  the  Monsoon  compiler. 

The  conflict-free  feasible  flight  paths  which  are  generated  and  tested  in  an  Id  environment 
can  be  translated  into  Lisp  data  structures  by  using  an  interface  between  Id  and  Common 
Lisp.  They  are  then  displayed  on  the  screen  and  simulated  in  an  interactive  manner. 

5.7.4  DARPA  Image  Understanding  Benchmark 

.■\rthu:  .'\llman.  visiting  from  Texas  Instruments,  began  implementing  the  DARPA  Imige 
rii(ler,sla:uling  benchmark  as  an  Id  application.  This  benchmark  performs  model-based 
recognition  of  a  2  1/2  I)  “mobile”  of  rectangles  from  two  512  X  512  pixel  images,  one 
containing  intensity  data  (8-bit  integers),  the  other  depth  data  (32-bit  IEEE  floating  point). 
.As  such,  it  performs  extensive  numeric  ( data-directed)  and  symbolic  (knowledge-directed) 
processing.  Once  the  l)enchmark  has  been  converted  to  Id,  he  will  evaluate  its  potential 
parallelism  and  related  performance  parameters  on  the  simulated  TTDA  target  machine 
provided  by  the  Gita  environment. 
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5.8  P-RISC 

Our  work  has  continued  to  bridge  the  once-wide  gap  between  von  Neumann  and  dataflow 
architectures.  In  1988,  Nikhil  and  Arvind  proposed  a  new  processor  architecture  called  P- 
RISC  (for  Parallel  RISC)  that  properly  extends  a  conventional  RISC  processor  in  such  a  way 
as  to  make  it  more  suitable  as  a  component  for  a  parallel  machine.  The  architecture  will 
be  presented  at  the  1989  International  Symposium  on  Computer  Architecture  in  Jerusalem 
[2431. 

We  first  organize  the  machine  so  that  instruction  and  frame  memory  are  local  to  a  processor, 
while  heap  memory  is  global.  Next,  we  identify  a  frame  (activation  record)  as  the  register 
set  for  a  thread.  The  control  state  of  a  thread  can  now  be  described  succinctly  as  a  token 
containing  an  instruction  pointer  and  a  frame  pointer. 

We  now  reorganize  the  processc  so  that  it  is  multithreaded.  The  first  step  is  to  introduce  a 
token  queue  that  can  contain  multiple  tokens.  On  each  clock  a  token  is  dequeued  and  sent 
through  the  processor  pipeline.  The  instruction  it  points  to  is  fetched  and  executed  relative 
to  the  frame  that  it  points  to.  Finally,  a  new  token  is  produced  that  is  reinserted  into  the 
token  queue.  Note  that  successive  tokens  can  be  from  unrelated  threads. 

To  deal  with  long  memory  latencies,  we  use  the  technique  of  I-structures.  A  load  instruction 
sends  a  request  to  memory  along  with  a  return  continuation.  Meanwhile,  the  processor 
is  free  to  execute  other  tokens.  The  response  from  memory  comes  back  with  the  return 
continuation — the  value  is  stored  in  the  frame  and  the  continuation  is  requeued. 

For  fine-grained  parallel  operation,  we  extend  the  instruction  set  with  three  new  instructions: 


•  fork,  which  is  like  a  jump,  except  that  it  also  produces  the  token  for  the  next  instruction 
(i.e.,  it  is  like  a  jump  and  continue). 

•  join,  which  specifies  a  frame  offset  containing  a  counter  initialized  to  n,  the  number 
of  threads  that  will  execute  this  instruction.  Each  execution  decrements  the  counter. 
Only  the  thread  that  decrements  it  to  0  continues— the  other  threads  are  discarded. 

•  start,  which  specifies  a  continuation  in  a  different  frame  (which  may  be  on  a  different 
processor),  along  with  a  value  to  be  stored  in  that  frame  before  the  continuation  is 
started. 


With  these  instructions,  it  is  possible  to  emulate  the  fine-grained  parallelism  of  a  dataflow 
graph.  Being  a  superset  of  a  conventional  RISC  instruction  set,  it  is  also  possible  to  execute 
conventional  compiled  code,  e.g.,  from  FORTRAN. 

We  have  begun  simulation  and  other  studies  to  evaluate  this  architecture,  described  below. 
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5.8.1  Compiling  for  P-RISC 

Bradley  Kuszmaul  specified  an  abstract  P-RISC  instruction  set,  complete  with  operational 
semantics  (specified  by  a  relation  on  machine  states). 

He  is  implementing  a  P-RISC  code  generator  for  the  Id  compiler.  Code  generation  proceeds 
by  transforming  a  data  flow  graph  into  a  control  flow  graph,  performing  certain  optimiza¬ 
tions,  and  then  transforming  the  control  flow  graph  into  machine  specific  code.  Examples  of 
machine  specific  code  which  might  be  generated  include: 


•  the  abstract  P-RISC  instruction  set  mentioned  above; 

•  other  specific  P-RISC  instruction  sets  (such  as  the  P-RISC  co-processor  for  a  RISC 
chip  being  worked  on  by  Sharma,  see  below); 

•  a  variant  of  Monsoon  with  registers; 

•  Eps’88; 

•  a  standard  serial  machine  (such  as  a  RISC  computer,  a  Vax,  a  Lisp  machine,  or  a  Cray 
supercomputer);  or 

•  off-the-shelf  parallel  MIMD  hardware. 


It  appears  that  the  control  flow  graph  intermediate  format  is  well  suited  for  the  target  ar¬ 
chitectures  mentioned  above.  Currently  only  parts  of  the  Id  language  are  correctly  compiled 
to  control  flow  graphs,  and  the  only  machine  specific  code  generated  by  the  compiler  are 
the  abstract  P-RISC  instruction  set  and  the  serial  code  for  the  Lisp  Machine.  Preliminary 
results  indicate  that  it  may  be  possible  to  run  Id  programs  almost  as  fast,  i.e.,  within  a  factor 
of  four  to  ten,  as  Lisp  or  C  programs. 

5.8.2  Simulator  for  P-RISC 

Ira  Scharf,  as  part  of  his  S.B.  thesis,  has  been  working  on  an  interpreter  for  the  abstract 
P  RISC  instruction  set  developed  by  Kuszmaul.  The  objective  is  to  build  a  tool  like  Gita, 
our  graph  interpreter  for  the  TTDA,  that  has  proved  so  invaluable  in  evaluating  the  TTDA. 
.■\  fir.st  version  of  the  interpreter  is  now  running. 

5.8.3  Implementation  of  P-RISC  Using  Ordinary  RISC  Processors 

Preliminary  to  the  P-RISC’  work,  Kuszmaul  and  Sharma  surveyed  commercial  RISC  chips, 
with  an  eye  towards  P-RISC  implementation.  We  then  did  some  design  and  back-of-the- 
envelope  analysis  of  various  strategies  for  implementing  P-RISC  on  commercial  RISC  hard¬ 
ware  (possibly  using  some  sort  of  co-processor  to  provide  a  hardware  assist  for  the  P-RISC 
specific  operations).  Those  (very  preliminary)  results  indicate  that  an  unmodified  commer¬ 
cial  RISC  computer  might  lose  only  a  factor  of  10  to  15  over  a  dedicated  P-RISC  processor. 
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By  adding  an  I-structure  memory  to  the  RISC  computer,  the  performance  degradation  com¬ 
pared  to  a  P-R.ISC  processor  drops  down  to  four  or  five  (Steele  has  spent  some  effort  at 
thinking  about  how  to  make  I-structure  memory  work  for  a  RISC  processor).  By  adding 
hardware  assist  for  context  switching,  that  degradation  goes  down  to  two  or  three. 

Sharma  has  been  trying  to  identify  a  way  of  efficiently  caching  activation  frames  of  tasks 
in  the  processor  register  set  so  as  to  minimize  the  penalty  incurred  on  switching  from  one 
thread  to  another.  We  developed  a  write-through  caching  scheme  that  caches  activation 
frames  in  a  set  of  register  windows.  We  have  also  proposed  a  scheme  which  allows  a  very 
high  degree  of  look-ahead  in  the  instruction  stream.  In  other  words,  a  processor  can  easily 
identify  the  next  15-20  instructions  to  be  executed.  We  accomplish  this  by  switching  threads 
even  on  conditional  branch  instructions — which  are  nondeterministic  in  the  sense  that  the 
flow  of  control  beyond  such  instructions  is  not  known  until  after  the  instruction  is  executed. 
Putting  these  two  schemes  together,  we  get  an  architecture  which  permits  switching  between 
threads  with  minimal  (potentially  zero)  penalty.  Further,  the  high  degree  of  look-ahead  in 
the  instruction  stream  may  offer  several  advantages  that  have  alluded  processor- pipeline 
designers  in  the  past.  We  are  currently  examining  these. 


5.9  Functional  Databases 

Michael  Heytens  continued  his  investigation  into  the  synthesis  of  databases  and  functional 
languages,  treating  an  update  transaction  as  a  declarative  specification  of  a  new  version  of 
the  database,  inspired  by  the  treatment  of  I-structures  in  Id.  After  completing  the  design  of 
a  kernel  database  language  to  express  such  updates,  he  has  begun  implementing  a  prototype, 
based  on  ideas  from  compiling  Id  to  P-RISC  machines. 
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6.1  Introduction 

In  the  last  year  we  have  worked  hard  at  preparing  a  cogent  proposal  for  the  final  design 
and  the  coiistructiou  of  CAM-8,  a  high  performance  cellular  automata  multiprocessor.  The 
proposal  has  finally  been  funded  by  DARPA,  and  we  are  extremely  busy  now  working  to 
deliver  the  goods. 

Some  of  the  more  theoretical  research  that  we  managed  to  carry  out  while  we  were  waiting 
for  an  answer  to  our  proposal,  and  that  is  described  in  this  report,  will  continue  at  a  slower 
pace  for  a  while,  until  the  design  of  the  VLSI  chip  that  constitutes  CAM-8’s  “heart”  is  shipped 
to  the  foundry. 

Our  activity  has  concentrated  on  the  following  areas,  discussed  in  more  detail  below; 


•  Relativistic  invariance  in  parallel  computations. 

•  Solid-body  motion  in  cellular  automata. 

•  What  variational  principles  may  look  like  in  discrete  systems. 

•  Further  design  of  C'AM-8 — a  large  cellular  automata  machine  for  (mainly)  physics  em¬ 
ulation  . 

•  Symmetric  and  asymmetric  interface  formation  in  conservative  interactive-particle  sys¬ 
tems. 

•  Pattern  recognition  by  texture-locked  loop. 

•  Identification  and  experimental  determination  of  specific  ergodicity  in  invertible  dy- 
namica!  systems. 

6.2  Relativistic  Invariance  in  Parallel  Computations 

W'o  have  continued  studying  the  problem  of  how  relativistic  invariance  may  emerge  at  the 
macroscopic  levels  in  <  ellular  automata  and  similar  discrete  systems,  in  which  such  invariance 
is  ineanir.gless  at  the  microscopic  level  (see  previous  progress  report). 

A  pr'diminary  report  on  la.st  year’s  work  has  appeared  in  [299].  Work  is  in  progress  to 
generaii/e  those  results  to  more  than  one  dimension  [292j. 

.\ii  alternaie  way  ot  probing  the  relationship  between  Lorentz  invariance  and  cellular  au¬ 
tomata  is  through  the  study  of  wave  equation  models  in  one  or  more  dimensions.  Note  that 
whih'  conventional  computational  models  use  discrete  space  but  continuous  state  variables 
at  each  sif<',  liere  we  are  trying  to  get  the  same  behavior  with  binary  state- variables. 
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The  one  dimensional  version  is  especially  tractable.  Not  only  do  we  have  cellular  automata 
rules  that  satisfy  the  wave  equation  exactly;  we  can  also  evaluate  in  closed  form,  via  com¬ 
binatorial  arguments,  phase-space  averages  over  the  entire  set  of  states  of,  say,  a  lattice 
string,  as  well  as  dynamical  and  statistical  properties  involving  kinetic  and  potential  ener¬ 
gies,  mean-square  sums  of  the  amplitudes  of  the  normal  modes,  and  the  moments  of  the 
Fourier  components  of  the  system. 

Because  these  implementations  use  particle-like  entities  to  simulate  wave  phenomena,  they 
also  naturally  lend  themselves  to  the  study  of  quantum  mechanical  phenomena,  especially 
vis-d-vis  Feynman  path-integral  methods.  Generalizing  our  results  for  the  n-dimensional 
wave  equation  to  the  corresponding  Dirac  and  Weyl  equations,  we  have  obtained  a  novel 
lattice  method  for  simulating  single-particle  quantum  phenomenawave,  in  the  spirit  of  the 
Bohm  interpretation. 


6.3  Solid-body  Motion  in  Cellular  Automata 

This  new  area  of  work  is  related  to  that  of  the  preceding  section. 

There  has  been  much  excitement  over  recent  work  on  fluid  modeling  with  cellular  automata. 
These  models  have  been  basically  point  models:  all  properties  of  the  fluids  have  been  rep¬ 
resented  by  the  contents  of  individual  cells.  As  one  simulates  a  larger  and  larger  range  of 
material  parameters,  this  approach  requires  more  and  more  bits  in  each  ceil,  with  an  ex¬ 
ponential  increase  in  the  size  of  the  lookup  table  (current  simulations  of  24-particle-per-site 
lattice  gases,  done  on  a  craY-X/MP  employ  about  one  gigabit  oi fast  memory  for  this  purpose 
'.121]!). 

V  iewing  cellular  automata  as  a  model  of  fine-grained  parallel  computation,  there  are  tremen¬ 
dous  technological  advantages  in  finding  and  using  simple  CA  rules. 

In  nature,  complex  materials  are  made  up  of  simple  parts  (atoms)  held  together  in  groups  by 
various  adhesive  and  cohesive  forces.  We  would  like  to  find  cellular  automata  models  with 
analogous  characteristics:  information  about  complex  material  properties  should  be  spread 
out  among  a  collection  of  cells.  This  involves  the  problem  of  simulating  forces  between  par- 
tides  in  an  ap[)ropriate  maiiner  (with  momentum  and  energy  conservation,  and  preserving 
reversihilit v  1  :  <>  wo  can  have  collections  of  particles  (bodie.s)  moving  together  and  in¬ 

teracting  with  one  another.  Since  this  approach  provides  an  alternative  to  using  extremely 
( .)iiipi('  <  rules,  we  consider  the  pr<..blem  of  not  knowing  how  to  make  moving  bodies  (and 
lorces)  in  cellular  automata  to  be  a  major  obstacle  standing  in  the  way  of  extensive  use  of 
cellular  automata  for  physical  modeling. 

I  his  problem  is  closel>  related  to  the  relativity  discussion  of  the  previous  section:  in  a 
relativistically-in variant  system,  we  have  the  same  physics  in  any  inertial  frame.  We  can 
therefore  iiave  bulk  motion  of  macroscopic  (and  microscopic)  bodies,  while  their  internal 
dynamics  a. id  chemistry  remains  essentially  unchanged.  Thus  if  we  can  have  (collective! 
bodies  at  all  in  a  relativistical’y  invariant  system,  we  automatically  have  moving  bodies 
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The  extent  to  which  the  opposite  implication  holds  (moving  bodies  imply  relativity)  is  an 
interesting  question,  which  this  research  will  also  address. 

Understanding  the  physical  aspects  of  the  discrete  entities  used  in  our  models  in  order  to 
account  for  deformations  of  solid  bodies  and  the  attendant  restoring  forces  will  lead  to  new 
intrinsic  measures  of  physical  interest,  such  as  discrete  stress  tensors. 

Among  the  many  preliminary  explorations  wc  have  made  in  this  area,  we  have  studied  in 
some  detail  a  simple  cellular  autom.aton  model  of  string-like  objects  freely  moving  in  space 
(in  one,  two,  or  three  dimensions)  and  interacting  with  one  another  [78].  The  basic  object 
(.an  be  thought  of  as  a  chain  of  point-masses  connected  by  springs;  each  point  can  have 
adjustable  mass  and  momentum,  and  each  link  adjustable  potential  energy.  Longitudinal  and 
transverse  vibrations  are  supported,  as  well  as  average  bulk  motion  and  collisions  between 
objects,  with  strict  conservation  of  the  above  quantities.  Some  applications  of  this  model 
are  under  investigation  [77], 


6.4  What  Variational  Principles  May  Look  Like  in  Discrete  Sys¬ 
tems 

The  variational  principles  of  mechanics  characterize  the  solutions  of  certain  differential  equa¬ 
tions  as  continuous  functions  for  infinitesimally  small  variations  of  which  the  value  of  certain 
continuous  functionals  remains  constant. 

In  many  “granular”  dynamical  systems,  such  as  cellular  automata,  both  the  independent 
variables  (space  and  time)  and  the  dependent  ones  (state  variables)  are  discrete,  and  thus 
do  not  admit  of  infinitesimally  small  variations.  Clearly,  variational  principles  in  their  tra¬ 
ditional  form  cannot  be  employed  here.  On  the  other  hand,  the  fundamental  role  played  by 
variational  principles  in  mathematical  physics  makes  one  suspect  that  something  having  the 
same  flavor  should  be  available  in  the  analysis  of  discrete  systems. 

Inrleed,  as  soon  as  one  studies  these  systems  from  a  macroscopic,  combinatorial  viewpoint, 
variational  principles  emerge  with  such  regularity  and  strength  as  to  make  one  believe  that 
the  whole  approach  should  be  reversed.  Instead  of  taking  continuous  variational  principles 
as  the  paragon,  and  looking  for  some  imitation  of  them  in  discrete  systems,  one  may  take 
as  a  productive  working  hypothesis  that  the  prototypical  variational  principles  arise  from 
com'.)inatorics,  V'ariational  principles  appear  with  such  regularity  in  physics  not  because 
they  represent  some  deep-seated  feature  of  physics  proper,  but  because  they  are  a  corollary  of 
general  combinatorial  laws  and  are  bound  to  arise  whenever  one  con.siders  systems  consisting 
of  a  large  miinbcr  of  elements. 

We  have  investigated  the  above  topics  in  a  variety  of  settings,  having  in  mind  la)  the  emer¬ 
gence  of  the  concept  of  energy  out  of  microscopic  combinatorics,  and  (b)  the  connection 
betwf'en  conserved  quantities  and  'symmetries.  In  particular,  we  have  discovered  cellular 
automata  models  displaying  exact  harmonic  motion  not  only  for  infinitesimal  perturbations, 
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but  also  for  arbitrarily  large  displacements.  We  have  started  studying  the  equilibrium  con¬ 
figurations  of  such  systems  in  various  dissipative  regimes,  establishing  a  connection  between 
the  variational  principles  that  govern  such  coiifigurations  at  a  macroscopic  level  and  the 
combinatorics  that  governs  them  at  the  macroscopic  level. 

6.5  CAM-8 — A  Large  Cellular  Automata  Machine  Suited  for 
Physics  Emulation 

We  have  completed  the  basic  design  of  CAM-8,  a  large,  high  performance  cellular  automata 
machine  which,  for  its  intended  areas  of  application,  will  be  by  far  the  fastest  computer  in 
the  world.  Indeed,  this  machine  will  constitute  a  “microscope”  into  “computational  worlds” 
that  were  until  now  inaccessible  (see  previous  progress  report  for  more  details),  and  thus 
will  stimulate  real,  practical  use  of  cellular  automata  as  a  modeling  environment.  Partial 
funding  for  the  actual  development  of  this  machine  has  been  granted  by  DARPA,  starting 
January  1989.  More  conceptual  aspects  of  this  architecture,  in  particular  in  the  context  of 
simulation  of  styb^ed  physical  systems,  fall  within  the  scope  of  our  NSF  contract. 

Cam-8  is  the  next  generation  in  a  line  of  Cellular  Automata  Machines  (CAMs)  developed 
at  the  MIT  Laboratory  for  Computer  Science,  and  are  already  used  by  many  investigators. 
The  essential  elements  of  the  CAM-8  architecture  and  some  of  its  intended  appbcations  are 
reported  in  [232]  [298].  (To  make  the  conceptual  aspects  and  the  potential  applications  of 
such  machines  accessible  to  a  wide  audience,  we  have  written  a  book,  Cellular  Automata 
Machines — A  New  Environment  for  Modeling,  which  constitutes  a  comprehensive  introduc¬ 
tion  to  the  subject  and  illustrates  the  use  of  an  earlier  machine,  CAM-6,  which  is  in  commercial 
production.  We  have  just  completed  the  second  edition  of  software  and  documentation  for 
caM-6  [71].) 

The  functional  architecture  of  CAM-8  is  fundamentally  that  of  a  cellular  automaton — where 
a  large  number  of  identical  atomic  processors  are  uniformly  interconnected  to  form  an  indef¬ 
initely  extended  two  or  three  dimensional  network  (“polynomial  interconnection”  architec¬ 
ture)  and  operated  in  synchronism.  This  approach  gives  CAM-8  unmatched  performance  in 
dealing  with  discrete,  fine-grained  models  of  systems  whose  topology  reflects  that  of  ordinary 
spacetime. 

Our  specific  implementation  of  the  basic  cellular  automaton  plan  includes  certain  refinements 
recommended  by  recent  theoretical  developments,  and  makes  use  of  a  number  of  original 
solutions  suggested  by  the  current  technological  context.  Some  of  these  features  allow  CAM- 
8  to  retain  a  high  level  of  performance  even  in  certain  areas  where  one  might  expect  that  an 
‘’exponential  interconnection”  architecture  (e.g.,  tree  or  hypercube)  would  be  mandatory. 

For  many  applications,  this  machine  may  be  visualized  as  a  volume  of  simulated  pro¬ 
grammable  matter  in  which  a  large  variety  of  experiments  on  spatially-extended  physical 
.systems  can  be  performed  rapidly  and  conveniently — a  useful  metaphor  for  this  is  a  “silicon 
wind-tunnel.”  Other  examples  include  the  simulation  of  physical  phenomena  such  as  ditfu 
sion,  aggregation,  and  phase  separation;  the  study  of  properties  of  materials  such  as  plasma 
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and  alloys,  and  of  chemical  reactions;  the  exploration  of  certain  models  of  fundamental 
physics;  and  a  number  of  practical  applications  such  as  the  study  of  how  waves  propagate 
in  a  nonhomogeneous  medium  and  are  reflected  by  arbitrarily  shaped  obstacles  (e.g.,  radar 
and  sonar  echo  analysis). 

In  addition,  CAM-8  will  constitute  a  powerful  computer  for  many  information  processing 
applications  dealing  with  fine-grained  structures  ha%ing  a  high  degree  of  regularity  in  at 
least  two  dimension^  -for  instance,  a  “si'lcon  retina”  v-dth  real  +ime  performance  Indeed, 
this  appears  to  be  an  ideal  architecture  for  many  pattern  recognition  and  tracking  tasks 
[300]. 

Further,  CAM-8  will  provide  a  natural  environment  for  the  simulation  of  large  scale  logic 
circuits;  in  particular,  for  exploring  the  potential  of  leconfigurable  circuits  (“downloadable 
hardware” ). 

Finally,  a  machine  of  the  functionality  of  CAM-8  will  be  indispensable  for  designing  and 
emulating  the  algorithms  that  will  constitute  the  firmware  of  a  new  generation  of  fully 
parallel  cellular  automaton  ultracomputers. 


6.6  Symmetric  and  Asymmetric  Interface  Formation  in  Conser¬ 
vative  Interacting-particle  Systems 

We  have  continued  on  interface  formation  in  immiscible  fluids,  simulated  on  the  basis  of 
microscopic  first  principles.  In  brief,  we  study  the  approach  to  equilibrium  of  a  system 
consisting  of  a  large  number  of  particles  of  two  kinds,  coupled  by  local  interactions,  under 
the  constraints  of  strict  invertibility  and  conservation  of  particle  species  and  total  energy. 

To  this  end,  we  have  equipped  one  CAM-6  unit  with  extended  processing  tables,  capable  of 
handling  the  more  complex  local  interactions  required  by  these  models.  Besides  refining  and 
extending  work  done  in  this  field  by  others  on  symmetric  interface  formation — which  entails 
only  binary  interaction — we  have  started  exploring  the  more  challenging  area  of  multiplet 
interactions,  which  has  allowed  us  to  model,  among  other  things,  asymmetric  surface  tension 
effects. 

The  necessary  energy  bias  in  the  rule  due  to  curvature  was  recognized  to  be  an  approxi¬ 
mation  to  a  new  distributed  quantity  that  we  have  called  winding  number  density,  and  the 
ramifications  of  this  quantity  are  being  investigated.  These  studies  have  also  yielded  bene¬ 
fits  by  suggesting  the  development  of  graph  related  algorithms  and  topological  concepts  for 
cellular  automata 


6.7  Pattern  Recognition  and  Tracking  by  Texture  Locked  Loops 

We  have  continued  working  on  a  pattern  recognition  method  that  is  applicable  to  a  limited 
but  pervasive  class  of  patterns  typically,  natural  landscape  features  such  as  rivers,  coast¬ 
lines.  urban  agglomerations,  etc.,  as  they  may  appear  on  a  satellite  photograph;  fingerprints 
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and  other  biological  constructs;  and,  in  general,  textures  and  structures  whose  long  range 
spatial  correlations  are  ultimately  explainable  in  terms  of  the  repeated  action  of  simple  local 
mechanisms  [300].  This  method,  which  can  be  thought  of  as  a  generalization  of  the  well- 
known  phase-locked  loop,  is  insensitive  to  large  amounts  of  noise;  it  can  take  good  advantage 
of  the  peculiar  tradeoffs  in  computational  resources  offered  by  fine-greiined  parallel  processors 
(such  as  Cellular  Automata  Machines,  the  Connection  Machine,  and  the  Massively  Paral¬ 
lel  Processor);  finally,  by  its  very  nature,  this  method  is  to  a  certain  extent  “aware”  of  its 
capabilities  and  its  limitations. 

This  approach  to  pattern  recognition  has  been  presented  to  several  technical  audiences  (GE 
Research  Labs,  Naval  Laboratory,  and  Lincoln  Labs)  and  has  been  well  received.  We  are 
now  exploring  specific  applications. 

6.8  Specific  Ergodicity 

We  have  started  working  on  a  new  theme,  mainly  the  identification  of  a  new  quantity  of  inter¬ 
est  in  dynamical  systems,  and  its  experimental  determination  in  a  number  of  representative 
cases. 

Specific  ergodicity  asks,  for  an  invertible  cellular  automaton,  what  fraction  of  the  total  in¬ 
formation  needed  to  identify  an  individual  state  is  devoted  to  specifying  the  position  of  this 
state  on  its  orbit.  We  give  empirical  evidence  that  this  question  has  a  definite  answer.  A 
preliminary  report  on  this  work  appeared  in  [299]. 

The  experiments  reported  in  the  above  reference  were  performed  using  a  few  AT  clones  full 
time  for  several  weeks,  which  is  equivalent  to  several  hours  on  a  typical  supercomputer.  In 
view  of  the  exponential  complexity  of  the  problem,  even  moderate  improvements  on  the 
numerical  estimates  obtained  so  far  would  require  a  drastic  increase  in  computing  power. 
For  simple  cellular  automata,  a  speedup  of  10,000  can  be  achieved  with  a  dedicated,  fully 
parallel  implementation  consisting  of  a  few  programmable  gate-array  chins.  We  have  set  up 
fast  dedicated  simulators  of  this  kind  for  some  simple  two  dimensional  cellular  automata, 
using  the  largest  available  XILINX  chips,  and  we  are  beginning  to  collect  experimental  data. 
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Mercury 

7.1  Introduction 


Mercury  is  a  communications  mechanism  that  supports  efficient  communication  among  pro¬ 
gram  modules  in  a  distributed,  heterogeneous  environment  [212][213].  Modules  act  as  clients 
and  servers:  a  server  is  a  module  that  provides  a  number  of  procedures  that  can  be  used  by 
other  modules,  called  clients,  to  interact  with  it.  Communication  occurs  by  means  of  call 
streams;  clients  make  calls  to  the  procedures  provided  by  servers  over  these  streams.  A  client 
is  able  to  make  three  kinds  of  calls:  synchronous  calls,  in  which  the  client  waits  until  the 
call  returns  before  making  a  subsequent  call  on  that  stream;  asynchronous  calls,  in  which 
the  client  can  make  a  number  of  calls  on  the  stream  without  waiting  and  pick  up  the  results 
of  the  calls  later;  and  sends,  which  are  like  asynchrorious  calls  except  that  the  client  picks 
up  results  only  if  a  call  terminates  in  an  exceptional  condition. 

During  the  current  year,  we  have  continued  to  work  on  the  design  and  implementation  of 
the  Mercury  communication  mechanism.  In  addition,  we  have  developed  a  new  protocol  for 
implementing  at-most-once  messages  efficiently,  and  have  started  work  on  a  new  project  to 
provide  an  object  repository  for  use  in  a  heterogeneous  network. 


7.2  A  Formal  Specification  for  Mercury  Call  Streams 

B.  Liskov  and  L.  Shrira  have  provided  a  formal  specification  for  Mercury  call  streams.  An 
important  goal  in  specifying  Mercury  streams  is  to  allow  the  different  language  veneers  to 
present  streams  differently  to  their  users.  The  intention  of  the  specification  is  then  twofold: 
first,  to  guarantee  that  the  different  veneers  “understand”  each  other;  and  second,  to  limit 
the  information  exposed  by  the  veneer  operations.  The  specification  deals  with  the  safety 
properties  of  the  protocol;  it  does  not  address  the  performance  aspects  of  call  streams. 

The  specification  uses  an  event-based  model.  It  defines  the  events  that  can  be  observed  by 
users  of  streams  and  restricts  the  legal  sequences  of  those  events.  The  specification  allows 
differences  in  veneers  by  defining  the  common  primitive  events  that  underlie  the  veneer 
operations.  In  other  words,  the  specification  does  not  define  a  set  of  stream  operations  that 
all  veneers  must  provide.  Instead,  veneers  are  free  to  define  a  convenient  set  of  operations. 
However,  the  meaning  of  any  stream  operation  provided  by  a  veneer  must  be  defined  in 
terms  of  a  legal  sequence  of  the  defined  events  that  represents  the  effect  of  that  operation 
on  the  stream.  For  example,  in  one  veneer,  user  programs  at  servers  might  explicitly  wait 
for  the  next  call  to  arrive;  while  in  another,  a  process  might  be  created  automatically  by 
the  veneer  when  a  call  arrives  without  the  user  code  having  to  wait.  The  executions  of  the 
stream  operations  in  both  veneers  must  be  explained  using  event  sequences  permitted  by 
our  specification. 

The  operations  in  different  veneers  need  not  expose  all  details  of  streams.  The  specification 
defines  the  most  that  can  be  observed  by  user  code;  veneers  are  always  free  to  hide  detail. 
For  example,  user  code  at  the  receiver  may  not  be  able  to  observe  that  a  stream  is  broken 
(i.e.,  unable  to  transmit  messages)  even  though  our  events  convey  this  information. 


96 


Mercury 


7.3  Mercury/ Argus  Veneer 

T.  Bloom  and  D.  Curtis  have  been  working  on  the  implementation  of  the  Argus  veneer. 
Stream  calls  have  been  added  to  the  client-side  veneer.  The  promises  mechanism  [213]  is 
implemented  with  a  procedural  interface  rather  than  syntactic  support.  The  Mercury  catalog 
implementation  has  been  enhanced  to  provide  both  service  and  port  registration,  and  the 
restart /recovery  mechanism  is  fully  implemented.  At  this  point  the  Argus  veneer  is  fully 
functional,  with  the  exception  of  support  for  a  few  additional  Mercury  types  (vspaces)  to  be 
incorporated.  Work  has  started  on  designing  a  test  suite  for  use  with  all  the  veneers  and  on 
performance  analysis.  D.  Curtis  and  her  students  have  implemented  a  calendar  application 
and  a  distributed  login-uid  as  Mercury  services  are  built  in  Argus. 


7  4  Transactions  in  Mercury 


B.  Liskov  and  W.  Weihl  hav'  developed  a  design  for  the  protocols  to  be  used  in  messages 
concerning  transactions  in  Mercury.  Mercury  transactions  will  be  compatible  with  those  in 
Argus  so  that  Argus  servers  can  be  accessed  under  Mercury.  However,  some  changes  from 
the  Argus  protocols  are  needed  because  the  constraints  in  Mercury  are  somewhat  different 
than  Argus.  For  example,  in  Argus  every  remote  call  must  be  made  as  a  subaction;  Mercury 
does  not  require  this.  Making  a  call  as  a  subaction  insulates  the  caller  from  failures  that 
occur  in  the  call:  if  the  call  does  not  complete,  or  it  aborts  at  the  callee,  the  ceilling  action 
need  not  abort.  If  there  is  no  subaction,  these  circumstances  wiU  force  the  calling  action  to 
abort,  and  the  messages  exchanged  in  this  case  must  indicate  this  fact. 

B.  Liskov  and  W.  Weihl  also  defined  a  new  entity  called  a  transaction  management  server. 
Such  servers  will  run  at  many  Mercury  nodes  and  can  be  used  across  the  net  via  Mercury 
Soi  earns.  A  transaction  management  server  performs  various  housekeeping  chores  associated 
with  transactions  on  behalf  of  clients.  It  is  advantageous  for  two  reasons; 

1.  It  reduces  the  work  needed  to  implement  transactions  in  C  and  Lisp.  Instead  these 
veneers  can  call  on  the  .server  to  do  much  of  the  work  in  implementing  transactions. 

2.  It  can  reduce  the  probability  of  a  failure  occurring  in  the  middle  of  two-phase  commit. 
This  is  possible  because  the  servers  can  be  located  at  more  reliable  nodes,  and  will  not 
be  directly  under  the  control  of  users  who  might,  for  example,  turn  off  the  machine 
they  are  using. 

The  servers  themselves  are  easy  to  implement,  since  they  are  simply  specialized  Argus 
guardians.  Also,  the  server  uesign  does  not  require  extra  communication  that  would  de¬ 
lay  the  execution  of  user  transactions.  For  example,  it  is  not  necessary  to  communicate 
with  the  server  to  create  a  transaction  or  to  make  a  call.  Instead,  the  server  is  used  only 
at  two-phase  commit  and  delays  the  commit  of  the  transaction  as  seen  by  the  user  by  one 
message  delay. 
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7.5  Efficient  At-most-once  Messages  Based  on  Synchronized  Clocks 


B.  Liskov,  L.  Shrira  and  J.  Wroclawski  [214]  have  designed  a  new  efficient  message  passing 
protocol  that  guarantees  at-most-once  message  delivery  without  requiring  communication  to 
establish  connections.  The  goal  is  to  be  able  to  accept  messages  most  of  the  time  even  when 
the  receiving  module  has  no  state  information  stored  about  the  sending  module.  The  scheme 
is  interesting  because  it  allows  us  to  efficiently  implement  at-most-once  remote  procedure 
calls  (RPC's),  even  when  there  are  large  numbers  of  clients  and  servers  and  when  clients 
communicate  with  servers  only  occasionally. 

.\t-most-once  semantics  for  RPCs  means  that  a  call  is  guaranteed  to  be  executed  at  most  once 
even  when  failures  occur  such  as  a  crash  of  the  receiving  module.  It  is  desirable  because  it 
provides  proper  semantics  even  when  calls  are  not  idempotent.  However,  the  implementation 
of  at-most-once  semantics  can  be  expensive  because  the  server  needs  a  way  of  determining 
whether  it  has  seen  a  message  before.  The  determination  can  be  made  if  the  server  maintains 
some  state,  known  as  a  connection^  for  the  client.  If  there  is  no  state,  the  connection  must  be 
established,  which  typically  requires  a  pair  of  messages  to  be  exchanged  between  the  client 
and  the  server.  If  the  connection  is  used  for  many  calls,  the  cost  of  the  cormection  setup  can 
be  amortized  across  ail  of  them.  If  there  are  only  a  few  calls,  the  overhead  is  high  relative  to 
useful  work.  In  the  worst  case,  only  one  call  will  be  made  on  the  connection,  and  the  cost 
of  the  call  is  doubled.  Yet  this  case  may  be  quite  common;  it  corresponds  to  clients  using 
servers  only  occasionally. 

To  avoid  the  cost  in  this  common  case,  systems  have  provided  at-kast-once  semantics,  which 
provides  only  weak  guarantees  about  how  many  times  a  call  is  executed.  For  example, 
even  when  a  call  terminates  normally,  it  may  have  been  executed  more  than  once.  Some 
systems  provide  at-least-once  semantics  as  the  only  option;  others  provide  it  as  an  alternative 
available  to  the  client  if  desired.  Both  approaches  are  undesirable:  with  only  at-least-once 
available,  the  application  programmer  must  cope  explicitly  with  the  problems  arising  from 
non-idempotent  calls.  Things  are  better  when  both  are  available,  but  the  communication 
system  is  more  complicated  than  if  there  is  just  one  choice. 

Our  work  shows  that  it  is  practical  and  efficient  to  provide  only  at-most-once  semantics.  The 
iTiethod  allows  calls  to  be  made  without  prior  communication  to  establish  a  connection.  Ours 
is  not  the  first  method  to  do  this;  the  Delta-t  protocol  [307]  also  avoids  connection  setup. 
However,  use  a  different  technique  based  on  loosely  synchronized,  monotonic  clocks.  Our 
orotocol  can  easily  tolerate  the  Jock  skews  provided  by  existing  clock  synchronization  proto¬ 
cols;  these  skews  are  typically  less  than  100  milliseconds.  If  the  rare  event  of  unsynchron'zed 
clocks  uoe'-'  occur,  the  protocol  continues  to  work  correctly  although  there  is  a  degradation  of 
pertormance.  The  protocol  requires  that  clocks  at  servers  that  survive  crashes  be  monotonic; 
it  does  not  rely  on  ;  roperties  of  clients’  clocks  for  correctness. 

We  used  the  me.ssage  protocol  to  implement  at-most-once  RPCs  based  on  the  .SunRPC  library 
and  compared  our  performance  with  at-least-once  and  at-most-once  RPCs  already  available 
in  the  SunHPC  library.  Our  performance  measurements  indicate  that  at-most-once  RPCs 
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can  be  provided  at  the  same  cost  as  less  desirable  ones  that  do  not  guarantee  at-most-once 
execution. 


7.6  Object  Repository 


B.  Liskov  and  L.  Shrira  and  a  group  of  students  have  been  working  on  the  design  of  an 
Object  Repository.  Programs  of  today  make  use  of  file  systems  to  store  data  that  must 
survive  from  one  day  to  the  next.  Programs  of  the  future  will  use  object  repositories  instead. 
Object  repositories  are  better  suited  to  the  needs  of  programs  and  users  because  they  correct 
several  deficiencies  of  file  systems,  as  discussed  below. 

Our  repository  will  provide  the  following  features:  it  will  be  language  independent,  store 
typed  objects,  support  atomic  transactions,  be  both  highly  reliable  and  highly  available, 
and  will  control  access  to  its  objects.  The  repository  fits  in  well  with  the  work  on  Mercury, 
which  provides  a  method  for  cHents  to  use  repository  through  its  call  streams  and  provides 
a  language-independent  type  system  that  can  be  used  in  the  repository.  It  will  also  be  a 
useful  service  to  be  made  available  through  Mercury. 

Some  aspects  of  the  repository  are: 

Objects  vs.  Files:  An  object  repository  stores  objects  instead  of  files.  Objects  differ  from 
files  in  three  important  ways.  First,  they  are  often  small.  File  systems  tend  to  be  biased 
toward  large  objects,  so  that  users  must  combine  small  objects  together  into  large  ones  to 
use  the  system  efficiently.  By  contrast,  an  object  repository  must  be  engineered  to  work 
efficiently  for  small  objects  as  well. 

Secondly,  an  object  repository  knows  about  the  types  of  objects  stored  in  it.  File  systems 
do  not  have  such  information,  so  they  provide  no  help  for  users  to  avoid  type  errors.  An 
object  repository  does  provide  such  help.  Of  course,  the  types  in  use  in  the  repository  must 
be  independent  of  particular  programming  languages,  since  we  want  to  allow  many  different 
languages  to  use,  and  communicate  through,  the  repository.  We  already  have  such  a  type 
system,  namely,  the  type  system  developed  for  communication  in  Mercury.  Furthermore,  the 
Mercury  fibrary  can  be  relied  upon  to  store  information  about  types,  and  to  give  a  system- 
wide  meaning  to  types,  thus  permitting  us  to  avoid  type  errors  in  the  object  repository. 

The  type  system  for  the  object  repository  must  include  abstract  data  types  because  the 
repository  will  need  to  invoke  operations  of  a  type  when  processing  queries.  The  operations 
must  be  defined  by  the  person  who  defines  the  new  type. 

The  third  point  is  that  objects  in  object  repository  may  refer  to  other  objects  in  the  repos¬ 
itory,  thus  representing  the  kinds  of  sharing  structures  that  are  often  useful  in  programs. 
By  contrast,  file  systems  do  not  allow  files  to  refer  to  other  files  in  a  way  understood  by 
the  system.  Having  interconnected  objects  raises  interesting  questions  related  to  naming 
the  objects,  reading  complex  objects,  allocating  and  deallocating  memory  for  objects,  and 
garbage  collection. 
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Transactions:  Interactions  with  the  object  repository  will  occur  within  atomic  transactions; 
we  expect  to  use  the  Mercury  mechanism  here.  However,  an  object  repository  is  likely  to 
use  a  transaction  mechanism  in  a  nonstandard  way.  For  example,  a  program  development 
support  system  might  be  implemented  on  top  of  an  object  repository.  When  a  person  is 
working  on  a  new  release  of  a  particular  module,  he  is  likely  to  have  that  object  locked  for  a 
long  time.  We  would  not  want  to  provide  such  an  ability  by  running  the  entire  production 
of  the  new  release  as  a  transaction,  since  holding  locks  for  a  long  time  is  not  good  for 
system  performance.  Instead  we  need  to  devise  a  different  kind  of  interface,  probably  of  the 
“check-out /check-in”  variety.  The  design  of  such  an  interface  is  a  challenging  problem. 

Availability  and  Reliability:  The  repository  must  be  both  highly  available,  so  that  indi¬ 
vidual  objects  are  very  likely  to  be  usable  when  needed,  and  highly  reliable,  so  that  infor¬ 
mation  entrusted  to  it  is  not  lost  with  high  probability.  The  plan  is  to  store  objects  in  the 
repository  at  a  small  number  of  server  nodes.  To  speed  up  interaction  with  the  repository, 
clients  will  maintain  caches  containing  recently  used  objects.  To  achieve  high  availability,  we 
will  need  to  use  replication.  Our  new  primary  copy  technique  [246]  should  be  good  for  the 
repository.  Copies  of  each  object  will  reside  at  several  servers;  the  database  as  a  whole  will 
be  partitioned  so  that  the  load  at  the  servers  that  implement  the  repository  will  be  balanced. 

If  the  servers  that  store  copies  of  an  object  are  sufficiently  failure  independent,  then  a  highly 
available  system  is  also  highly  reliable.  We  may  choose  to  achieve  failure  independence  by 
equipping  our  servers  with  universal  power  supplies.  In  addition,  we  plan  to  investigate  both 
archival  mechanisms  and  backup  mechanisms  to  be  used  when  catastrophies  occur. 

Access  Control:  Sharing  is  only  useful  if  it  can  be  controlled.  For  example,  sensitive 
data  will  be  stored  in  the  repository  only  if  reading  can  be  controlled.  Such  control  can  be 
achieved  by  access  control  mechanisms  based  on  the  authentication  methods  being  developed 
for  Mercury. 

Language  Interface:  T.  Bloom  and  S.  Zdonik  (of  Brown  University)  have  been  working  on 
issues  of  object-oriented  database  design.  They  have  been  looking  at  the  problem  of  merging 
object-oriented  languages  and  databases  into  seamless  database  programming  languages  with 
uniform  access  to  all  objects.  This  work  is  related  to  the  way  the  object  repository  will 
interface  to  Mercury  host  languages. 
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8.1  Introduction 

The  1988-89  academic  year  marked  the  final  year  of  operation  of  the  Parallel  Processing 
Group,  due  to  the  group  leader’s  departure  from  MIT. 

The  group’s  focus  has  been  to  learn  how  to  build  parallel  processors  that  can  be  programmed 
for  general  purpose  applications.  The  group’s  efforts  are  based  on  the  parallel  Lisp  language 
Multilisp  !147][149][148].  Members  of  the  group  have  worked  on  several  aspects  of  parallel 
processing:  implementation  of  parallel  Lisp  systems,  speculative  computation,  applications 
for  parallel  Lisp,  parallel  program  debugging  and  tuning  aids,  and  design  of  architectures 
well  suited  for  parallel  Lisp. 

.4  major  nulestone  during  the  year  has  been  the  completion  of  the  Mul-T  high  performance 
parallel  Lisp  system  [189],  which  compiles  code  for  the  Encore  Multimax  multiprocessor  and 
largely  obsoletes  the  group’s  earlier  (interpreter-based)  MultiUsp  implementation  hosted  on 
the  Concert  multiprocessor  [147][149][152].  Being  smaller  and  more  malleable,  the  Concert 
implementation  of  Multilisp  continues  to  be  useful  for  quick  experiments  with  modifica¬ 
tions  of  Multilisp,  but  the  Mul-T  system  has  performance  that  is  better  by  two  orders  of 
magnitude. 

Other  milestones  include  the  successful  demonstration  of  a  Multilisp  system  that  supports 
speculative  computation  and  the  completion  of  ParVis,  a  tool  for  debugging  and  tuning  par¬ 
allel  Lisp  programs.  Investigations  of  the  parameters  affecting  performance  of  parallel  Time 
Warp  [172]  simulations,  and  of  naming  problems  in  Lisp  systems,  were  conducted.  Finally, 
a  preliminary  version  of  MARCH,  a  processor  architecture  capable  of  efficiently  executing 
parallel  Lisp  programs,  was  studied  via  simulation,  and  directions  for  future  improvement 
were  identified.  A  notable  result  of  this  study  was  the  design  of  control  logic  for  a  coherent 
cache  that  can  connect  to  a  multithreaded  processor  and  a  split-transaction  bus. 

The  following  sections  describe  each  of  the  above  mentioned  aspects  of  the  group’s  activity 
in  more  detail. 


8.2  High  Performance  Parallel  Lisp 

A  major  accomplishment  for  the  year  was  the  completion  (by  D.  Kranz  and  E.  Mohr)  of 
the  Mul-T  high  performance  parallel  Lisp  system  [189],  which  runs  on  an  Encore  Multimax 
multiprocessor.  This  effort  began  with  the  T  system  from  Yale,  which  implements  a  dialect 
of  the  Scheme  language  (1|[272].  Mul-T  is  a  parallel  version  of  T  with  a  parallel  garbage 
ccdlector.  Mul-7'  is  most  notable  for  its  high  performance:  use  of  the  T  system’s  ORBIT 
compiler  |]90’il88|  leads  to  performance  about  100  times  faster  than  the  group’s  earlier 
Mnitilisp  implementation  on  the  Concert  multiprocessor. 

.Mul  T  shows  that  Multilisp’s  future  construct  can  be  implemented  cheaply  enough  to  be 
useful  in  a  )iroduction-()uality  system.  On  the  Boyer  Lisp  benchmark,  for  example,  Mul-T 
was  able  to  achieve  higher  performance  even  on  two  processors  than  is  achieved  by  running 
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the  sequential  Boyer  program  in  the  sequential  T  system  (whose  compiler  is  as  good  as  or 
better  than  competitive  commercial  Lisp  compilers)  [189].  In  addition  to  its  performance 
advantages,  Mul-T  includes  a  set  of  debugging  functions  that  extend  T’s  debugging  features 
to  handle  a  parallel  execution  environment. 

Innovations  in  Mul-T  include  the  use  of  inlining  as  a  run-time  mechanism  to  increase  task 
granularity  and  reduce  task-management  costs,  and  the  concept  of  groups  of  tasks  which 
provide  convenient  units  to  manipulate  when  debugging.  Mul-T  has  been  made  available 
at  no  charge  through  network  file  transfer,  or  at  a  nominal  charge  from  Encore  Computer 
Corporation. 


8.3  Speculative  Computation 


Our  investigation  of  the  design  of  mechanisms  to  support  speculative  computation  in  Mul- 
tihsp  continues.  Speculative  computation  is  eager  evaluation  where  the  result(s)  of  the 
evaluation  may  be  unnecessary.  It  is  a  gamble  whereby  one  trades  additional,  possibly  un¬ 
necessary,  computation  for  potentially  faster  execution.  Speculative  computation  contrasts 
with  mandatory  computation,  in  which  all  computations  are  presumed  to  be  necessary.  Spec¬ 
ulative  computation  requires  a  means  to  control  computation  to  favor  the  most  promising 
computations,  and  the  ability  to  abort  computation  and  reclaim  computation  resources. 

Our  interpreter-based  implementation  of  Multilisp  [147][149][148]  has  been  extended  (by 
R.  Osborne)  with  constructs  for  speculative  computation,  and  performance  of  several  spec¬ 
ulative  programs  has  been  measured.  These  measurements  demonstrate  that  performing 
computations  in  parallel  before  their  results  are  known  to  be  required  can  yield  performance 
improvements  over  conventional  approaches  to  parallel  computing.  On  the  Boyer  theorem¬ 
proving  benchmark  and  a  traveling-salesman  application,  speculative  computation  yielded 
performance  improvements  of  up  to  a  factor  of  2  over  the  best  program  using  only  mandatory 
constructs,  while  at  the  same  time  ehminating  the  tuning  of  parameters  needed  to  achieve 
that  performance  in  the  mandatory  arena.  On  a  heuristic  program  to  solve  the  8-puzzle 
[249],  a  performance  increase  of  a  factor  of  26  was  measured. 

The  main  conceptual  contribution  of  this  work  is  a  sponsor  model  that  provides  a  framework 
for  management  of  speculative  computation.  This  sponsor  model  handles  control  and  recla¬ 
mation  of  computation  in  a  single,  elegant  framework.  A  sponsor  is  an  agent  that  controls 
the  allocation  of  resources  to  computation.  A  sponsor  supplies  attributes  (such  as  a  priority, 
or  a  claim  on  a  certain  quantity  of  compute  time)  to  computations  that  are  sponsored  by 
it.  Every  running  task  is  sponsored  by  one  or  more  sponsors,  and  every  sponsor  can  sponsor 
several  tasks,  as  well  as  sponsor  other  sponsors.  Thus,  sponsors  can  be  organized  into  hierar¬ 
chical  networks  that  mirror  the  structure  of  the  computation,  and  sponsors  are  a  modularity 
construct  that  provide  a  “handle”  on  a  subcomputation  that  can  be  used  without  explicit 
reference  to  the  set  of  tasks  currently  performing  that  subcomputation.  The  contribution 
of  the  current  work  is  the  development  of  the  basic  sponsor  model  (inspired  by  the  work  of 
W.  Kornfeld  and  C.  Hewitt  [186])  into  a  concrete  mechanism,  illustration  of  how  it  can  be 
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implemented  with  acceptable  efficiency,  and  demonstration  of  its  application  to  speculative 
computation  scenarios  including  side  effects  and  complex  inter-task  dependencies. 


8.4  Naming  in  Lisp  Systems 

A  separate  exploration  (by  J.  Loaiza)  focused  on  the  problem  of  defining  and  creating  mod¬ 
ules  in  Lisp.  Three  general  styles  of  specifying  and  creating  modules  were  identified.  The 
three  styles  vary  in  ease  of  use,  simplicity,  and  modularity.  A  module-system  design  was 
developed  that  treats  a  module  as  an  independent  object  that  requires  a  set  of  input  values 
and  produces  a  set  of  output  values.  The  code  that  interconnects  modules  is  kept  separate 
from  the  code  that  implements  modules.  Methods  of  making  dynamic  modifications  to  a 
system  of  modules  were  also  explored. 


8.5  Analysis  of  Scheduling  in  Parallel  Simulations 

Analysis  of  a  message-based  system  for  concurrent  simulation  in  MuUilisp,  based  on  the  Time 
Warp  system  of  D.  Jefferson  [172],  was  completed  (by  M.  Ma).  Time  Warp  is  based  on  the 
paradigm  of  tas^s  exchanging  messages.  Each  message  has  a  “virtual,”  or  simulated  time; 
messages  received  by  a  task  are  to  be  processed  in  order  of  increasing  virtual  time.  Time 
Warp  uses  an  optimistic  concurrency  control  strategy  in  which  processors  eagerly  process 
available  messages  even  when  it  is  possible  that  a  given  message  will  be  processed  before 
another  message  with  a  lower  virtual  time  but  a  later  real  time  of  arrival  at  its  destination 
task.  When  this  occurs,  the  destination  task  must  be  hacked  up  to  an  earlier  state  and 
re-executed  from  that  point  so  that  the  required  virtual  time  order  is  not  violated.  Backups 
clearly  represent  wasted  work  and  should  be  minimized.  The  number  of  backups  is  thus  an 
important  performance  measure,  along  with  the  total  time  taken  to  execute  a  simulation. 

Parameters  that  affect  the  performance  of  our  simulation  system  were  investigated  using  var¬ 
ious  simulations,  including  digital  circuits,  a  queuing  system,  and  simulations  of  message  flow 
in  communication  networks  with  grid  and  butterfly  topologies.  The  parameters  investigated 
include  different  choices  of  scheduling  algorithm,  based  not  only  on  whether  the  scheduling 
algorithm  was  static,  dynamic  or  nondeterministic,  but  also  on  the  method  of  grouping  tasks 
into  “partitions”  (scheduling  units).  Certain  methods  of  partitioning  yielded  much  better 
results  (processing  time  and  number  of  backups)  than  others.  A  model  describing  the  un¬ 
derlying  pht  c.  'niena  that  govern  the  performance  of  Time  Warp  simulations  was  developed 
r.rid  evaluated.  A  key  predictor  of  performance  is  the  extent  to  which  the  scheduling  method 
chosen  succeeds  in  minimizing  the  variations  in  virtual  times  between  partitions  (variations 
within  n  jjartition  were  not  so  important). 


8.6  Parallel  Program  Development  Aids 

difficult  and  important  problem  in  programming  parallel  processors  lies  in  understanding 
the  behavior  of  programs.  Examples  include  determining  where  time  is  being  spent  in  various 
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parts  of  a  program,  which  parts  are  being  executed  in  parallel,  and  where  the  bottlenecks 
are  located.  L.  Bagnall  completed  development  of  a  tool,  ParVis  (Parallel  Ktsualization), 
for  visualizing  the  execution  of  Multilisp  programs. 

During  a  Multilisp  program  run,  ParVis  records  the  time  of  events  representing  task  state 
transitions  and  intercommunication,  such  as  task  creation,  blocking,  resumption,  and  ter¬ 
mination.  ParVis  can  then  generate  a  graphical  display  of  this  information.  The  system 
provides  a  display  with  interactive  features  such  as  scrolling  and  zooming,  which  allow  the 
user  to  examine  various  parts  of  the  display. 

Because  the  program  visualization  utility  is  not  interactive,  but  creates  a  display  after  a 
program  run,  ParVis  communicates  via  data  files  rather  than  via  an  interactive  network 
connection.  Implementations  of  both  Multilisp  and  Mul-T  have  been  modified  to  generate 
the  necessary  trace  data  files. 

Because  the  additional  information  described  by  ParVis  can  lead  to  large,  complex  displays, 
a  filter  language  is  provided  to  allow  the  programmer  to  specify  the  displayed  items  that  are 
of  interest.  Particularly  notable  is  the  interface  between  the  filter  language  and  the  graphical 
display,  which  allows  easy  inclusion  of  display  elements  in  filter  definitions. 

ParVis  has  already  been  used  extensively  by  several  group  members  to  find  performance 
bottlenecks  and  analyze  the  effect  of  different  scheduling  disciplines  (e.g.,  for  speculative 
computation)  on  performance. 


8.7  Architecture  for  Parallel  Processing 


An  effort  to  learn  how  innovations  in  parallel  architectures  could  improve  the  performance 
of  parcdlel  Lisp  programs  is  being  pursued  (by  R.  Halstead,  D.  Nussbaum,  H.  Takagi,  and 
I.  Vuong-Adlerberg)  through  the  development  and  evaluation  of  a  processor  architecture 
called  MARCH  {Multilisp  ARCHitecture).  This  project  began  in  the  previous  year  with  the 
specification  of  MASA  [150],  a  processor  architecture  inspired  by  the  HEP-1  [187]  and  SPUR 
architectures  [162][297].  MARCH  differs  from  MASA  in  a  number  of  details  concerning 
procedure  linkage,  trap  handling  and  task  management,  which  have  been  revised  in  light  of 
experience  with  writing  run-time  support  routines  for  MASA. 

Like  MASA,  MARCH  features  several  non-overlapping  register  sets  that  can  be  used  in  a 
manner  like  SPUR’s  register  windows  to  reduce  memory  accesses  associated  with  procedure 
invocation.  Alternatively,  register  sets  can  be  allocated  to  concurrently  executing  tasks 
assigned  to  the  same  processor.  MARCH  thus  aims  to  make  procedure  linkage  efficient 
and  make  task  creation  equally  efficient.  MARCH’S  pipeline,  like  that  of  the  HEP-1,  can  be 
filled  by  issuing  instructions  from  different  tasks  on  consecutive  clock  cycles.  The  fast  context 
switching  implicit  in  the  HEP-style  instruction  issue  should  also  help  bridge  memory  access 
latencies  by  allowing  otter  tasks’  instructions  to  be  executed  while  awaiting  completion  of  a 
memory  operation. 
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Like  SPUR,  MARCH  depends  on  software  trap  handlers  to  handle  infrequent  conditions  and 
manage  processor  resources  such  as  register  sets.  In  particular,  a  scheduler  must  manage  the 
mapping  from  the  (unbounded)  set  of  runnable  tasks  to  the  (finite)  set  of  register  frames, 
and  a  frame  saver  must  be  invoked  whenever  a  request  for  a  register  frame  is  made  and  there 
are  currently  no  free  frames  to  satisfy  it. 

During  the  year,  implementation  of  a  MARCH  simulator  was  completed,  the  ORBIT  com¬ 
piler  from  the  Mul-T  system  was  retargeted  to  generate  code  for  MARCH,  and  basic  versions 
of  the  frame  saver,  trap  handlers,  and  scheduler  were  written.  As  a  result,  it  became  possible 
to  run  actual  (but  small)  parallel  Lisp  application  programs  and  measure  the  performance 
effects  of  different  design  decisions.  The  need  for  improved  scheduling  performance  led  to 
the  definition  (by  H.  Takagi)  of  an  interprocessor  interrupt  mechanism  for  MARCH.  Experi¬ 
ments  (by  H.  Takagi)  showed  that  grouping  the  scheduling  and  frame  saving  functions  into  a 
separate  housekeeper  task  generally  yields  better  performance  than  invoking  those  functions 
via  traps. 

As  a  result  of  the  year’s  activities,  considerable  progress  has  been  made  in  discovering  how  to 
achieve  MARCH’S  initial  gorJ  of  efficient  execution  of  parallel  Lisp  programs,  but  work  is  still 
needed  to  reduce  several  performance  costs.  Notably,  MARCH’S  HEP-like  instruction-issue 
mechanism  only  issues  an  instruction  when  the  previous  instruction  in  the  same  instruction 
stream  has  completed — this  requires  a  large  number  of  streams  to  fully  utilize  a  pipelined 
processor.  Also,  housekeeping  costs  are  still  too  high.  Although  the  Parallel  Processing 
Group  will  not  continue  in  existence  at  MIT,  we  hope  that  some  of  the  unfinished  work  on 
MARCH  may  continue  as  part  of  the  APRIL  project  [9]. 

Another  architectural  project  (pursued  by  I,  Vuong- Adler  berg)  concerned  the  design  of  a 
coherent  cache  for  MARCH.  The  cache  is  based  on  a  shared  bus,  which  is  made  a  split- 
transaction  bus  for  increased  bandwidth.  MARCH’S  ability  to  interleave  several  independent 
instruction  streams  prevents  the  processor  from  systematically  becoming  blocked  on  every 
cache  miss,  but  presents  new  challenges  for  cache  design. 

Upon  a  cache  miss,  a  conventional  cache  remains  unavailable  for  further  processor  requests 
until  the  data  has  been  fetched  and  the  miss  has  been  completely  processed.  A  new  cache 
design  is  needed  for  a  multithreaded  processor  like  MARCH,  which  will  allow  the  proces¬ 
sor  to  continue  to  access  the  cache  even  while  misses  are  being  processed.  The  use  of  a 
split- transaction  bus  further  complicates  the  design  challenge  by  increasing  the  variety  of 
asynchronous  events  to  which  the  cache  may  need  to  attend.  A  cache  design  to  meet  this 
challenge  was  developed  [305],  the  kind  of  coherence  that  it  provides  was  formally  defined, 
and  the  validity  of  the  cache  design  was  proven.  The  literature  offered  very  little  pre-existing 
support  for  such  proofs,  so  a  formalism  was  also  developed  for  expressing  statements  about 
consistency  and  aiding  in  their  proof  [305]. 
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9.1  Introduction 


Research  in  the  Programming  Methodology  Group  has  continued  to  focus  on  the  area  of 
distributed  computing.  In  addition  to  our  work  on  Argus  and  Mercury,  we  have  also  studied 
replication  methods,  implementation  of  distributed  applications,  and  theory  of  distributed 
systems.  Our  research  in  these  areas  is  described  below. 


9.2  Argus  and  Mercury 


We  have  continued  our  study  of  how  to  extend  Argus  to  provide  access  to  Mercury  mech¬ 
anisms.  One  issue  is  how  to  relate  Argus  types  to  Mercury  types.  The  mechanism  must 
support  two  activities;  building  relationships  between  Argus  and  Mercury  types,  and  in¬ 
dicating  what  relationships  to  use  in  making  a  remote  call.  Ideally,  the  method  chosen 
must  make  it  easy  to  do  things  in  a  standard  way,  yet  make  it  possible  to  relate  types  in 
nonstandard  way. 

Building  a  relationship  is  done  by  defining  an  association,  which  contains  two  translation 
functions,  one  mapping  from  an  Argus  type  to  a  Mercury  type  and  the  other  mapping  in 
the  opposite  direction.  Typically,  an  association  will  be  defined  as  part  of  implementing 
an  abstract  type,  although  it  is  possible  to  define  one  independently  as  well.  Argus  will 
provide  a  number  of  builtin  associations  that  relate  builtiiv  types  to  Mercury  types.  No 
restrictions  are  placed  on  the  number  of  associations  for  a  type;  instead,  an  Argus  type  can 
be  associated  with  many  Mercury  types  and  vice  versa.  For  each  Argus  type,  one  association 
can  be  declared  the  default. 

Indicating  what  associations  to  use  in  making  a  call  is  done  as  part  of  the  declaration  of  the 
type  of  the  remote  procedure  being  called.  If  a  default  association  is  desired  for  a  particular 
parameter,  only  the  Argus  type  need  be  given  for  that  parameter.  Otherwise,  the  association 
to  be  used  is  indicated  explicitly.  For  example, 

h:  handlertype  (char)  returns  (int$toJntl6)  signals  (over(real)) 

indicates  that  in  calls  of  h,  the  default  association  should  be  used  for  the  character  argument, 
and  also  for  the  real  result  in  the  case  where  the  exception  over  is  signaled.  However,  if  the 
call  returns  normally,  the  association  int$to.intl6  should  be  used  to  map  the  16  bit  integer 
returned  by  the  call  into  an  Argus  int.  (The  default  association  for  ints  maps  them  to  32  bit 
integers.) 


9.3  Formal  Models  for  Nested  Transactions 

William  Weihl,  working  with  Nancy  Lynch,  Michael  Merritt,  and  Alan  Fekete,  has  continued 
working  on  formal  models  for  nested  transactions.  In  the  last  year,  we  have  completed  a 
draft  of  a  book,  the  goal  of  which  is  to  unify  and  generalize  the  work  done  over  the  past 


112 


Programming  Methodology 


several  5'ears  on  modeling  algorithms  for  transaction  processing  in  distributed  systems.  Our 
attempts  at  unification  have  resulted  in  proofs  of  a  number  of  interesting  algorithms,  all 
in  the  same  general  framework.  The  development  of  the  general  model  has  also  resulted 
in  the  invention  of  new  algorithms  that  generalize  and  extend  existing  algorithms,  both  to 
handle  nested  transactions  and  to  permit  more  concurrency  than  is  permitted  by  existing 
algorithms.  We  have  also  defined  correctness  conditions  for  implementations  of  atomic  data 
types  in  languages  such  as  Argus.  These  correctness  conditions  serve  as  useful  guidelines 
during  program  design  and  implementation,  in  much  the  same  manner  as  loop  invariants 
can  be  used  for  sequential  programs. 


9.4  Recovery  Algorithms 


WiUiam  Weihl  has  continued  work  on  formal  models  and  verification  techniques  for  recov¬ 
ery  algorithms  for  transaction  systems.  Most  previous  work  treats  concurrency  control  and 
recovery  as  independent  problems.  In  practice,  however,  designing  a  concurrency  control  al¬ 
gorithm  requires  careful  consideration  of  the  details  of  recovery.  We  have  developed  a  model 
for  transaction  processing  systems  that  aUows  concurrency  control  and  recovery  algorithms 
to  be  described  abstractly  in  simple  mathematical  terms.  The  interactions  between  concur¬ 
rency  control  and  recovery  can  then  be  analyzed  relatively  simply.  In  a  separate  step,  the 
implementations  of  the  concurrency  control  and  recovery  algorithms  can  each  be  shown  to 
implement  the  more  abstract  descriptions  used  to  analyze  their  interactions.  We  have  used 
the  model  to  analyze  the  constraints  placed  by  two  separate  recovery  algorithms,  update- 
in-place  and  deferred-update,  on  conflict-based  concurrency  control  algorithms.  We  have 
proved  necessary  and  sufficient  conditions  for  a  concurrency  control  algorithm  to  work  with 
each  of  the  recovery  algorithms.  These  conditions  are  interesting  for  several  reasons.  First, 
they  give  precise  bounds  on  the  level  of  concurrency  permitted  by  each  recovery  method.  Sec¬ 
ond,  they  directly  lead  to  new  concurrency  control  algorithms  that  permit  more  concurrency 
than  previously  existing  algorithms.  Third,  the  two  recovery  algorithms  are  incomparable 
in  terms  of  the  constraints  each  places  on  concurrency  control:  each  permits  concurrency 
control  algorithms  that  the  other  does  not.  These  results  are  described  in  a  paper  in  the 
Proceedings  of  the  1989  Symposium  on  Principles  of  Database  Systems. 


9.5  Storage  Management  for  Persistent  Memory 


William  Weihl  and  Elliot  Kolodner  have  been  working  on  efficient  automatic  storage  man¬ 
agement  for  persistent  memory.  Crash  recovery  algorithms  for  databases  require  explicit 
interaction  with  the  recovery  system  to  allocate  and  free  stable  objects,  and  do  not  cope 
with  objects  that  change  locations.  We  have  developed  garbage  collection  algorithms  for 
crash-tolerant  systems.  The  algorithms  are  described  in  a  paper  in  the  Proceedings  of  the 
1989  SIGMOD  conference.  We  are  currently  looking  at  incremental  and  generation-based 
methods. 
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9.6  The  Meaning  of  Subtypes 

In  a  thesis  finished  in  December  1988  [199],  Gary  Leavens  describes  a  verification  method 
for  object-oriented  programs  that  use  subtypes.  Object-oriented  programs  are  polymorphic, 
in  the  sense  that  the  type  (or  class)  of  the  target  of  a  message  may  not  be  known  until  run 
time;  thus,  the  actual  code  that  will  be  run  as  the  result  of  an  invocation  is  not  statically 
determinable.  Leavens  presents  the  idea  of  a  “simulation  relation,”  which  explains  how  the 
objects  of  a  subtype  can  be  viewed  as  objects  of  a  supertype,  and  shows  how  simulation 
relations  can  be  used  to  verify  object-oriented  programs.  The  method  is  a  natural  extension 
of  standard  axiomatic  techniques  for  specifying  and  verifying  programs  that  use  abstract 
data  types. 


9.7  New  Replication  Method 

In  a  thesis  completed  in  May  1989  [192],  Rivka  Ladin  has  defined  a  new  replication  method. 
This  method  is  an  extension  of  our  earlier  work  [193]  on  replication  techniques.  Information 
is  stored  at  a  logically  centralized  service  that  is  accessible  to  clients  by  making  remote  calls 
on  its  operations.  To  make  the  service  highly  available,  so  that  it  is  likely  to  be  accessible  to 
clients  when  needed,  the  service  is  implemented  by  a  number  of  replicas.  Client  operations 
take  place  at  just  one  replica,  and  can  usually  be  processed  without  delay,  so  the  replication 
technique  does  not  slow  clients  down.  Update  operations,  which  modify  the  state  of  the 
service,  never  cause  a  delay;  the  replica  that  performs  the  update  communicates  the  new 
information  to  the  other  replicas  in  the  background  by  using  “gossip”  messages.  Query 
operations,  which  observe  the  state,  will  be  delayed  if  an  update  whose  effect  needs  to  be 
observed  is  not  yet  known  at  the  replica  processing  the  operation;  but  this  occurs  rarely. 

Ladin’s  method  allows  clients  to  indicate  how  operations  are  to  be  ordered.  Both  queries  and 
updates  can  be  required  to  occur  after  other  updates.  This  is  accomplished  by  associating 
each  update  with  a  unique  identifier  and  having  each  query  and  update  take  a  set  of  uids  as 
an  argument;  the  service  will  ensure  that  the  query  or  update  occurs  after  all  updates  whose 
uids  are  in  the  set. 

The  client-specified  ordering  does  not  provide  a  way  to  order  operations  that  occur  in  parallel. 
To  handle  this  case,  Ladin  defines  two  extensions  that  allow  stronger  orderings  to  be  defined. 
These  extensions  increase  the  applicability  of  the  method  so  that  it  can  be  used  in  systems 
where  most  operations  are  ordered  by  clients,  but  occasionally  a  stronger  order  is  required. 


9.8  Garbage  Collection  in  a  Distributed  System 


Also  as  part  of  her  thesis,  Ladin  has  defined  a  new  technique  for  doing  garbage  collection  of  a 
distributed  heap.  The  method  uses  a  centralized  service  to  keep  track  of  inter-node  references; 
the  service  is  made  highly  available  by  replication,  e.g.,  using  the  technique  described  above. 
The  method  allows  nodes  to  garbage  collect  independently,  using  different  algorithms  if 
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desired.  After  doing  a  garbage  collection,  a  node  informs  the  service  about  its  references  to 
other  nodes’  objects  and  inquires  about  the  accessibility  of  any  of  its  objects  that  might  be 
accessed  by  other  nodes.  The  method  is  thus  able  to  detect  cycles  of  inaccessible  objects. 
Using  the  central  service  has  several  advantages:  it  requires  fewer  messages  than  if  nodes 
communicate  with  one  another  directly;  it  offloads  work  from  the  clients  to  the  server,  thus 
freeing  up  the  clients  to  work  on  behalf  of  their  users;  and  it  scales  to  large  systems. 


9.9  High  Availability  for  Linda 

In  a  thesis  completed  in  August  1988  [313],  Andrew  Xu  defined  a  method  for  providing 
high  availability  for  the  Linda  tuple  space  [72].  The  Linda  tuple  space  is  a  nonstandard 
memory  model  that  requires  less  synchronization  between  reads  and  WTites  than  a  standard 
model.  As  such,  it  is  of  interest  for  parallel  and  distributed  systems  because  the  reduced 
synchronization  can  translate  into  better  performance.  Previous  implementations  proposed 
for  Linda,  however,  do  not  support  high  availability:  if  any  node  containing  part  of  the 
tuple  space  fails,  the  memory  is  lost.  Xu’s  thesis  provides  an  efficient  implementation  that 
overcomes  this  problem. 

9.10  Optimistic  Concurrency  Control  in  Distributed  Systems 

In  a  thesis  completed  in  May  1989  [140],  Bob  Gruber  extended  optimistic  concurrency  control 
to  work  in  a  distributed  system  that  supports  nested  atomic  transactions.  His  thesis  describes 
two  methods.  The  first  uses  the  fixed  action  model  in  which  a  transaction  runs  entirely  at 
a  single  site  and  all  objects  that  it  uses  are  copied  to  that  site.  In  the  second,  the  fixed 
object  model,  objects  never  move;  instead  the  transaction  runs  at  the  sites  of  the  objects  it 
uses.  The  performance  of  the  two  methods  appears  to  be  roughly  the  same,  although  the 
algorithms  used  in  the  fixed  object  model  are  more  complicated.  The  fixed  action  model 
is  probably  the  more  interesting  of  the  two  because  it  matches  recent  work  in  distributed 
object-oriented  data  bases,  in  which  copies  of  the  objects  used  by  a  client  reside  in  the  client’s 
cache. 


9.11  Stable  Storage  Service 

In  a  thesis  finished  in  May  1989  [83],  Jeff  Cohen  designed  and  partially  implemented  a  new 
stable  storage  systems  for  Argus.  The  current  implementation  of  Argus  uses  a  disk  at  each 
node  to  store  the  stable  information  of  all  guardians  at  that  node.  This  means  that  our 
stable  storage  is  not  truly  stable,  since  a  failure  of  a  single  disk  can  cause  the  loss  of  all 
stable  information  for  that  node.  True  stable  storage  would  require  two  disks  per  node,  and 
the  time  needed  to  write  to  stable  storage  would  be  high,  since  each  stable  write  would  need 
to  be  done  to  both  disks  and  the  two  disk  writes  must  happen  sequentially  [195]. 

To  reduce  these  costs,  Cohen  worked  on  the  new  system,  which  is  based  on  [92].  In  his 
system,  stable  storage  is  provided  by  a  stable  storage  service  that  can  be  accessed  across 
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the  network.  The  service  consists  of  three  server  nodes.  To  write  or  read  information  at  the 
service,  the  Argus  system  at  a  guardian  must  communicate  with  any  two  of  these  servers. 
Each  server  has  a  large  disk  and  an  uninterruptible  power  supply.  The  power  supply  allows 
the  server  to  handle  a  write  request  entirely  in  primary  memory;  the  new  information  in 
the  write  request  is  written  to  disk  later  in  background  mode.  This  means  that  a  write  to 
an  individual  server  takes  a  length  of  time  roughly  equal  to  a  roundtrip  message  delay.  We 
expect  the  new  service  to  be  faster  than  the  current  Argus  system  because  the  writes  to  the 
two  servers  can  be  done  in  parallel  and  the  network  roundtrip  delay  in  a  local  area  net  is 
less  than  the  delay  for  a  disk  write. 
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[8'  B.  Liskov.  Introduction  to  Mercury.  Lecture  given  at  Siemens,  February  1989. 
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Programming  Systems  Resea.rch 

10.1  Introduction 


The  Programming  Systems  Research  Group  has  made  progress  in  two  areas  during  the 
l!)S8-89  year.  The  group  has  worked  on  a  new  programming  model  for  parallel  computation 
involving  the  notion  of  an  effect  system.  Furthermore,  the  group  has  enhanced  the  function 
of  the  distributed  database  system  that  we  have  designed  and  implemented. 

The  prototype  language  FX  is  a  new  programming  model  for  parallel  computation  that 
combines  the  good  features  of  both  imperative  and  functional  programming  languages.  Ii 
uses  an  effect  system  to  investigate  the  use  of  effect  specifications  on  controlling  concurrency. 
An  effect  system  is  analogous  to  a  type  system  (as  found  in  many  programming  languages); 
but  whereas  types  describe  what  results  from  a  computation,  effects  classify  how  the  com¬ 
putation  proceeds.  Our  system  deduces  information  that  will  enable  the  efficient  parallel 
implementation  of  a  broad  class  of  polymorphic  programming  languages. 

During  the  past  year,  our  investigations  with  the  prototype  implementation  have  focused  on: 


•  the  design  and  implementation  of  FX  subsets,  one  of  which  is  used  in  the  graduate 
programming  language  course; 

•  the  design  and  testing  of  optimistic  inference  algorithms  for  side  effect  estimation  of 
h  X  expressions; 

•  developing  effect  specifications  for  message  passing  concurrency  and  first  class  contin¬ 
uations. 

We  also  developed  The  Boston  Community  Information  System  (BCIS)  in  a  contin¬ 
uation  of  our  work  from  last  year.  BCIS  is  a  large  scale  information  system  that  is  in  use  at 
over  150  sites  in  the  Boston  area.  The  system  was  improved  during  the  last  year  with  the 
addition  of  an  "dcctronic  mail  interface  to  the  text-based  article  retrieval  system.  The  goal 
of  our  research  v.uth  BCIS  is  to  explore  how  the  broadcast  system  architecture  can  be  used 
to  implement  ir  “formation  systems  which  can  support  very  large  user  populations — perhaps 
up  to  one  million  users, 

10.2  Community  Information  System 

During  the  1988-89  year,  we  continued  to  run  the  Boston  Community  Information  System 
experiment.  This  experiment  provides  New  York  Times  and  Associated  Press  news  wires 
to  our  users.  It  consists  of  three  main  programs:  a  PC  based  version  (BCIS),  a  TC-PIP 
program  (Walter)  and  an  electronic  mail  based  system  (The  Clipping  Service).  Over  the 
past  year,  over  200  Boston  area  homes,  40  Internet  hosts  and  60  electronic  mail  users  have 
partiripaterl  in  our  experiment. 

The  experiment  ,dso  yielded  exploration  into  other  platforms.  An  Apple  Macintosi,  •"’•‘-''on 
of  the  service  has  been  prototyped  and  an  X  Windowing  System  implementation  of  he 
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Walter  database  query  application  has  b<'en  completed.  XWalter  provide.s  a  “point  and- 
click”  interface  to  novice  users,  'urtliermore,  Cdipsend,  tin*  electronic  mail  portion  of  tin- 
experiment,  is  continuing  to  be  improved  to  run  more  reliably  and  support  a  large  user 
community.  We  already  have  users  in  all  areas  of  the  world,  such  as  Japan,  Switzerland,  and 
France.  We  expect  to  be  able  to  support  well  over  100  users  by  the  end  of  next  year. 

Our  experiment  to  charge  our  P(1  participants  five  dollars  per  lufinth  for  the  broadcast  service 
has  been  successful;  over  two-thirds  of  the  user  population  continues  to  participate.  During 
the  past  year,  we  have  spent  time  exploring  the  benefits  of  the  technology  and  improving  tin- 
documentation  for  the  existing  services.  The  experimental  data  report  analyzed  the  immense 
amount  of  feedback  ive  received  from  our  users,  and  foui.o  that  the  Boston  Community 
Information  System  provides  a  useful  comiilement  to  existing  media  forms  and  has  jirovc-d 
valuable  to  the  many  users  in  the  t(-st  pofmlation. 


10.3  FX  Effect  Analysis 


The  notion  of  effect  system  has  been  extended  to  deal  with  diflerent  aspect  of  compile-time 
analysis  of  programs: 


•  An  extension  to  introduce  the  so-called  “control  efl’ects”  has  been  developed.  This 
technique  allows  the  introduction  of  first  class  continuations  in  the  FX-87  programming 
language. 

•  Explicit  parallelism  can  also  be  incorporated  in  a  language  that  uses  an  effect  system. 
A  message-based  extension  lo  r''X-87  has  been  designed  and  implemented  on  top  of  the 
experimental  FX-87  Interpreter. 


A  major  redesign  of  the  FX  programming  language  is  under  way  in  the  PSR  Group: 


•  A  complete  draft  refer<-nce  manual  of  this  new  version  of  FX  has  been  written;  Pierre 
Jouvelot  is  one  of  the  co-editors  of  this  specification. 

•  This  design  introduces  first  class  modules.  Among  them,  a  vector  facility  inspired  by 
Fortran  8X  and  the  Scan  Model  has  lieen  proposed. 

10.3.1  Reasoning  about  Continuations  with  Control  Effects 

First  class  continuations  add  a  great  deal  of  expressive  power  to  a  programming  language  as 
they  p»-rmit  the  implementation  of  a  wiih-  variety  of  control  structures,  inebuling  jumps,  error 
handlers,  and  coroutines.  With  this  pow-er  comes  substantial  si-mantic  and  implenientat ional 
comfih'xities.  d'hus  it  would  be  very  us<-ful  to  lie  able  to  precisely  id(-ntify  which  expri-ssions 
in  a  program  use  first  class  continuations  and  in  what  manm-r. 
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We  have  developed  a  new  static  analysis  method  for  first  class  continuations  that  uses  an 
effect  system  to  classify  the  control  domain  behavior  of  expressions  in  a  a  typed  polymorphic 
language.  We  introduce  two  new  control  effects,  goto  and  comefrom,  that  describe  the 
control  flow  properties  of  expressions.  An  expression  that  does  not  have  a  goto  effect  is  said 
to  be  continuation-following  because  it  will  always  call  its  return  continuation.  An  expression 
that  does  not  have  comefrom  effect  is  said  to  be  continuation-discarding  because  it  will  never 
preserve  its  return  continuation  for  later  use.  Unobservable  control  effects  can  be  masked 
by  the  effect  system.  Control  effect  soundness  theorems  guarantee  that  the  effects  computed 
statically  by  the  effect  system  are  a  conservative  approximation  of  the  dynamic  behavior  of 
an  expression. 

The  effect  system  that  we  describe  performs  certain  kinds  of  control  flow  analysis  that  were 
not  previously  feasible.  This  analysis  can  enable  a  variety  of  compiler  optimizations,  includ¬ 
ing  parallel  expression  scheduling  in  the  presence  of  complex  control  structures.  This  control 
effect  system  has  been  implemented  in  the  context  of  the  FX-87  programming  language. 

10.3.2  Communication  Effects  for  Message-based  Concurrency 

.Although  a  fair  amount  of  parallelism  can  be  automatically  extracted  from  sequential  pro¬ 
grams  by  smart  compilers,  there  are  some  problems  for  which  an  explicitly  parallel  algorithm 
is  more  natural  to  express  and  easier  to  efficiently  implement.  There  are  numerous  parallel 
paradigms  that  can  be  added  to  an  otherwise  sequential  language  to  fulfill  that  goal,  such  as 
message  passing,  systolic  programming,  and  fork/join  models.  We  have  developed  a  message- 
based  communication  framework  based  on  communication  effects.  Communication  effects 
are  used  to  describe  the  communication  behavior  of  expressions  in  a  typed  polymorphic  pro¬ 
gramming  language.  Concurrency  occurs  between  processes  connected  by  channels  on  which 
messages  are  transmitted.  Communication  operations  are  characterized  by  two  operators, 
out  and  in,  depending  on  whether  a  message  has  been  sent  or  received.  Synchronization 
is  only  allowed  by  message  passing  along  shared  channels;  communication  via  mutation  of 
global  variables  is  strictly  prohibited  by  our  communication  effect  system,  thus  restricting 
the  amount  of  nondeterminacy  in  user  programs. 

Communication  effects  permit  a  programmer  to  express  concurrency  in  a  rather  flexible  way 
while  pre.serving  the  correctness  of  implicit  detection  of  parallelism  and  optimization  by  the 
compiler.  This  system  is  powerful  enough  to  express  many  other  pa.rallel  paradigms,  like 
systolic  arrays  or  pipes.  This  new  concurrency  framework  has  been  implemented  in  the 
FX-S7  programming  language. 

10.3.3  Polymorphism  and  Side  Effects 

We  have  been  engaged  in  a  project  to  develop  a  programming  model  for  parallel  computation 
which  combines  the  best  features  of  imperative  and  functional  programming  languages.  The 
effect  system  which  we  have  developed  is  analogous  to  a  type  system,  and  provides  an 
algebraic  framework  for  describing  the  behaviour  of  computations. 
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As  reported  last  year,  a  prototype  implementation  ol  the  FX  programming  language  has  been 
built.  Our  investigations  in  the  area  of  type  reconstnu  tion  systems  and  our  performance 
experiments  have  led  us  to  investigate  several  new  applications  c-f  (effect  systems; 


•  a  static  checking  system  for  abstract  type  constructors  and  destructors  v/hich  facilitates 
the  convenient  use  of  modular  data  implementations;  and 

•  a  more  flexible  static  type  reconstruction  system  which  may  permit  memory  represen¬ 
tation  optimizations. 


Type  Reconstruction  for  Pattern  Matching 


We  have  developed  a  typing  system  which  permits  data  constructor  and  destructor  proce¬ 
dures  to  be  used  as  first  class  values;  this  typing  system  per!nit.s  first  class  procedures  to 
double  as  pattern-match  operators  and  generalizes  the  notion  of  pattern-matching. 


Typechecking  Polymorphic  Expressions  with  Side  Effects 

We  have  investigated  a  simple  typing  system  which  associates  side  effects  with  type  variables 
in  order  to  permit  polymorphism  in  the  types  of  some  expressions  which  perform  side  effects. 

10.3.4  FX  Large  Project  Programming 

We  have  continued  work  on  the  FX  module  system.  FX  modules  allow  abstract  types, 
transparent  types,  and  values  to  be  packaged  together  into  first  class  modules.  A  system  of 
static  dependent  fypes  guarantees  type  safety  in  the  pres-mee  of  first  flass  niodules.  We  use 
the  FX  effect  system  to  guarantee  type  safety  in  the  presence  of  side  effects.  Our  system, 
combined  with  a  simple  facility  for  reading  (module)  values  from  files,  oh>viaie.s  the  need  for 
a  separate  linking  language  (as  in  ML  or  Clu). 

This  past  year,  we  integrated  this  module  system  into  our  new  prototype  FX  implementation. 
Our  type  reconstruction  system  allows  many  declarations  within  modules  to  be  omitted  and 
permits  clients  of  modules  to  benefit  from  implicit  polymorphism  of  values  exported  by 
modules. 

Our  implementation  packages  the  standard  types  and  operations  into  a  large  FX  module 
that  is  available  to  the  programmer.  This  allows  our  language  design  and  implcinenlation  to 
be  more  modular  by  separating  the  language  kerne!  from  these  stamirtre  language  featnre.s. 
Modules  also  provide  a  convenient  way  to  experinmnt  with  u'w-  lan.gn.ag;  'V.i'wrr-s  ns  well  as 
new  versions  of  older  features. 

We  are  continuing  to  refine  tin,-  module  system  design  by  i  rr,  (-sFgat  i  ;ig: 
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•  more  concise  methods  of  combining  modules  to  form  larger  ones, 

•  support  for  common  idioms  (like  ML’s  datatype),  and 

•  support  for  persistent  module  storage. 


128 


Progmmming  Systems  Research 


10.4  Publications 

[1]  D.K.  Gifford  and  P.  Jouvelot.  Parallel  functional  programming:  the  FX  project.  In 
International  Workshop  on  Parallel  and  Distributed  Algorithms,  Bonus,  France,  North- 
Holland,  October  1988. 

[2]  R.T.  Hammel  and  D.K.  Gifford.  FX-87  Performance  Measurements:  Daiaflcnc  Im¬ 
plementation.  Technical  Report  MIT/LCS/TR-421,  MIT  Laboratory  for  Computer 
Science,  November  1988. 

[3]  P.  Jouvelot  and  D.K.  Gifford.  Reasoning  about  continuations  with  control  effects.  In 
Proceedings  of  SIGPLAN  ’89  Conference  on  Pix)gmm.ming  Language  Design  and  Imple¬ 
mentation,  Portland,  OR,  June  1989. 

[4]  P.  Jouvelot  and  D.K.  Gifford.  Communication  Effects  for  Message-based  Concurt'ency. 
Technical  Memo  MIT/LCS/TM-386,  MIT  Laboratory  for  Computer  Science,  1989. 

[5]  P.  Jouvelot  and  D.K.  Gifford.  The  FX-87  interpreter.  In  Proceedings  of  the  Second 
IEEE  International  Conference  on  Computer  Languages,  Miami  Beach,  FL,  October 
1988. 

[6]  P.  Jouvelot  and  V.  Domic.  FX-87,  or  what  comes  after  Scheme?  Special  Edition  of  the 
BIGRE  Bulletin,  AFCET,  France,  May  1989. 

[7]  J.W.  O’Toole,  Jr.  and  D.K.  Gifford.  Type  reconstruction  with  first  class  polymorphic 
vedues.  In  Proceedings  of  the  SIGPLAN  ’89  Conference  on  Programming  Language 
Design  and  Implementation,  Portland,  OR,  June  1989. 

[8]  J.W.  O’Toole,  Jr.  A  Comparison  of  Polymorphic  Type  Abstraction  Rules.  Technical 
Memo  MIT/LCS/TM-380,  MIT  Laboratory  for  Computer  Science,  1988. 

[9]  M.A.  Sheldon  and  D.K.  Gifford.  Static  dependent  types  for  first-class  modules.  ACM 
’90  Conference  on  Lisp  and  Functional  Programming.  Submitted  for  publication. 

Theses  Completed 

[1]  R.T.  Hammel.  An  FX-87  Compiler  for  a  Dataflow  Machine.  Master’s  thesis,  MIT 
Department  of  Electrical  Engineering  and  Computer  Science,  September  JOSS. 

[2]  J.  O’Toole.  Polymorphic  Description  Reconstruction.  Ma.ster’s  thesis,  MIT  Department 
of  Electrical  Engineering  and  Computer  Science,  May  1980. 

[3]  K.  Peltonen.  A  Gnu-Emacs  Interface  to  the  Community  Information  Systems  Project. 
Bachelor’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science. 
May  1989. 

[4]  D.  A.  Segal.  Macnews:  An  Interactive  News  Retrieval  Service  for  the  Macintosh. 
Bachelor’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science, 
May  1989. 


129 


[5]  M.  Sheldon.  Static  Dependent  Types  for  First-class  Modules.  Master’s  thesis,  MIT 
Department  of  Electrical  Engineering  and  Computer  Science,  May  1989. 

[6]  L.  Zhang.  A  New  Architecture  for  Packet  Switching  Network  Protocols.  PhD  thesis, 
MIT  Department  of  Electrical  Engineering  and  Computer  Science,  June  1989. 


Theses  in  Progress 

:lj  L.  LeMaire.  An  Architecture  for  Software  Reusability  Tools.  Master’s  thesis,  MIT 
Department  of  Electrical  Engineering  and  Computer  Science,  expected  September  1990. 

[2]  J,  O’Toole.  PhD  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer 
Science,  expected  May  1992. 

[3]  J.  Rauen.  A  Module  System  for  Scheme.  Master’s  thesis,  MIT  Department  of  Electrical 
Engineering  and  Computer  Science,  expected  May  1990. 

i4]  M.  Sheldon.  An  Automatically  Indexed  Module  Repository.  PhD  thesis,  MIT  Depart¬ 
ment  of  Electrical  Engineering  and  Computer  Science,  expected  May  1992. 


Talks 

[1]  D.  Gifford.  Broadcast  technology  for  information  systems.  MIT  ILO  presentation  to 
KDD  Corporation,  MIT,  Cambridge,  MA,  October  1988. 

[2]  D.  Gifford.  Broadcast  technology  for  information  systems.  MIT  ILO  presentation  to 
Bell-Northern  Research,  MIT,  Cambridge,  MA,  March  1989. 

[31  P.  Jouvelot.  Pragmatics  in  parallel  functional  programming.  Lecture  given  at  Second 
FIRTECH  Symposium,  Paris,  France,  November  1988. 

!4j  P.  Jouvelot.  Reasoning  about  continuations  with  control  effects.  Lecture  given  at 
SIC1PL.4N  ’88  PLDI,  Portland  OR,  June  1989. 

[5j  P.  Jouvelot.  Boston  community  information  system.  MIT  ILO  presentation  to  NEC, 
MIT.  Cambridge,  MA,  August  1988. 

6'  .1.  O’Toole.  Type  reconstruction  with  first  class  polymorphic  values.  Lecture  given  at 
SIGPL.4N  ’88  PLDI,  Portland,  OR,  June  1989. 

7  J.  O’Toole.  Distributed  databases  using  broadcast  media.  MIT  ILO  presentation  to 
•Matsushita  Electric  Corporation,  MIT,  Cambridge,  MA,  August  1988. 

J.  O'Toole.  Modular  polymorphic  programming  with  effects.  MIT  ILO  presentation  to 
NFXb  MIT,  Cambridge,  MA,  April  1989. 


130 


Spoken  Language  Systems 


Research  Staff 

J.  Glass  S.  Seneff 

M.  Phillips  V.  Zue,  Group  Leader 


Technical  Staff 
D.  Goodine  K.  Isaacs 


Graduate  Students 


N.  Daly 
R.  Kassel 
H.  Leung 
J.  Marcus 
H.  Meng 


J.  Pitrelli 
D.  Rtischev 
M.  Soclof 
S.  Trowbridge 


Undergraduate  Students 

W.  Foster  C.  Pao 

M.  McCandless  D.  Whitney 


Support  Staff 
V.  Palay 

Visitors 


S.  Ono  K.  Takeda 


131 


Spoken  Language  Systems 

11.1  Introduction 


Spoken  language  input  to  computers  is  a  major  goal  in  our  research  in  developing  a  graceful 
human- machine  interface.  Despite  some  recent  successful  demonstrations  of  speech  recogni¬ 
tion  capabilities,  current  systems  typically  fall  far  short  of  human  capabilities  of  continuous 
speech  recognition  with  essentially  unrestricted  vocabulary  and  speakers,  under  difficult 
acoustic  environments.  Our  approach  to  this  problem  is  to  seek  a  good  understanding  of 
human  communication  through  spoken  language,  to  capture  the  essential  features  of  the 
process  in  appropriate  models,  and  to  develop  the  necessary  computational  framework  to 
make  use  of  these  models  for  machine  understanding. 

It  is  our  belief  that  the  development  of  advanced  human/machine  communication  systems 
will  require  expertise  in  signal  processing,  system  theory,  pattern  recognition,  and  computer 
science,  built  on  a  solid  understanding  of  speech  science  and  linguistics.  We  place  heavy 
emphasis  on  designing  systems  that  can  make  use  of  the  knowledge  gained  over  the  past 
four  decades  in  human  communication,  with  hope  that  such  systems  will  one  day  have  a 
performance  approaching  that  of  humans.  Specifically,  our  approach  is  based  on  the  following 
premise.s: 


•  The  speech  signal  contains  information  regarding  the  intended  linguistic  message.  It 
also  contains  information  on  the  acoustic  environment  and  the  identity  and  physiolog- 
ical/nsychological  states  of  the  speaker.  As  far  as  speech  recognition  is  concerned,  the 
latter  sources  of  information  can  be  considered  as  undesirable  noise.  Robust  speech 
recognition  is  critically  tied  to  our  ability  to  successfully  extract  the  linguistic  infor¬ 
mation  and  discard  those  aspects  that  are  extra-linguistic. 

•  Past  research  in  spoken  language  communication  has  established  phonemes  as  psycho¬ 
logically  real  units  for  representing  words  in  the  lexicon.  Therefore,  phonemes  and 
other  equivalent  descriptors,  such  as  distinctive  features  and  syllables,  are  the  most 
appropriate  units  to  relate  words  to  the  speech  signal  for  machine  recognition  as  well. 

•  While  phonemes  are  discrete  abstract  linguistic  entities,  their  acoustic  realizations  in 
speech  are  inherently  continuous,  reflecting  the  movement  of  the  articulators  from  one 
position  to  the  next.  Many  of  the  acoustic  cues  for  phonetic  contrasts  are  encoded  at 
specific  times  in  the  speech  signal.  In  order  to  fully  utilize  these  acoustic  attributes, 
we  believe  that  one  must  explicitly  establish  acoustic  landmarks  in  the  signal. 

•  Previous  attempts  at  explicit  utilization  of  speech  knowledge  have  resulted  in  the 
devi'lopinent  of  systems  that  are  based  or  heuristic  rules.  Such  efforts  typically  require 
intf’nse  knowledge  engineering,  and  as  such  are  often  hampered  by  the  lack  of  a  unified 
control  strategy.  As  a  result,  system  development  is  slow,  and  the  performance  fragile. 
In  contrast,  we  seek  to  make  use  of  the  available  speech  knowledge  by  embedding  such 
knowledge  in  a  formal  framework  whereby  powerful  mathematical  tools  can  be  utilized 
to  f)j)timi'/.e  its  use. 
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•  Despite  significant  advances  made  in  phonetics,  phonology,  and  other  aspects  of  linguis¬ 
tics  over  the  past  decades,  we  still  lack  a  complete  understanding  of  the  human  sj)eech 
communication  process.  To  deal  with  our  present  state  of  ignorance  and  the  inherent 
variability  that  exists  throughout  the  process,  the  speech  recognition  system  must  have 
a  stochastic  component.  However,  it  is  our  belief  that  speech-specific  knowledge  will 
enable  us  to  build  more  sophisticated  stochastic  models  than  what  is  currently  being 
attempted,  and  to  reduce  the  amount  of  training  data  necessary  for  high  performance. 

•  The  ultimate  goal  of  our  research  is  the  understanding  of  the  spoken  message,  and  the 
subsequent  accomplishment  of  a  task  based  on  this  understanding.  To  achieve  this 
goal,  we  must  fully  integrate  the  speech  recognition  part  of  the  problem  with  natural 
language  processing  so  that  higher  level  linguistic  and  pragmatic  constraints  can  be 
utilized. 

•  The  development  of  a  spoken  language  understanding  system  will  require  interactions 
with  several  disciplines  in  computer  science.  Parallel  computing  will  be  necessary  for 
real  time  processing.  Efficient  algorithms  can  greatly  reduce  the  search  space  for  the 
recognition  process.  Finally,  theories  of  learning  will  help  the  system  to  adapt  to  new 
speakers,  environments,  and  tasks. 


The  research  projects  in  the  Spoken  Language  Systems  Group  fall  into  several  areas.  First, 
a  number  of  basic  research  topics  are  being  explored.  These  include  the  formulation  and 
testing  of  various  computational  models  for  human  auditory  processing,  speech  perception, 
and  natural  language  processing,  suitable  for  spoken  language  understanding.  We  are  also 
attempting  to  quantify  the  acoustic  cues  for  phonetic  contrasts,  and  the  effects  of  speaking 
rate  and  style  on  the  acoustic  properties  of  speech.  Secondly,  these  research  results  are  fun- 
neled  into  the  development  of  an  experimental  spoken  language  system.  Thirdly,  alternative 
approaches  to  speech  recognition,  including  the  use  of  artificial  neural  nets  and  strategies 
derived  from  vision  research,  are  being  explored.  Finally,  part  of  our  effort  is  devoted  to  the 
development  of  the  necessary  infrastructure,  including  the  development  of  speech  research 
tools  and  databases. 

The  Spoken  Language  Systerr^-  Group  was  formed  in  January  1989,  with  members  drawn 
from  the  Speech  C’ommunicf  u  Group  at  the  Research  Laboratory  of  Electronics.  While 
the  research  described  herewith  was  conducted  primarily  at  LCS  and  is  intended  to  cover 
only  the  six  months  period  since  January,  some  overlap  with  our  earlier  research  activities 
at  RLE  is  unavoidable. 


11.2  Research  Reports 

11.2.1  Continuous  Speech  Recognition:  The  SUMMIT  System 

Recently,  we  have  put  together  a  speech  recognition  system  which  embodies  sonu'  of  the 
research  that  we  have  been  conducting  in  automatic  speech  recognition,  ddie  system,  which 
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Figure  11.1:  Interrrvediate  representation  leading  to  the  recognition  of  the  sentence,  “Where 
is  the  nearest  hospital?”  The  display  contains:  (a)  synchrony  spectrogram,  (b)  a  dendrogram 
describing  the  multi-level  acoustic  segmentation,  (c)  a  phonetic  recognition  network,  (d)  a 
word  pronunciation  network,  and  (e)  the  recognition  result. _ 

we  call  SUMMIT,  is  intended  to  serve  as  a  testbed  for  a  segmental-based  approach  to  speech 
recognition.  In  addi*^ion,  it  enables  us  to  explore  how  speech  recognition  can  be  integrated 
with  natural  language  processing  in  order  to  achieve  speech  understanding. 

The  SUMMIT  system  starts  the  recognition  process  by  first  transforming  the  speech  signal 
into  a  representation  that  models  the  known  properties  of  the  human  auditory  system  [283]. 
The  representation  is  illustrated  in  Figure  11. 1(a),  for  the  sentence  “Where  is  the  nearest 
hospital?”  Using  the  output  of  the  auditory  model,  acoustic  landmarks  of  varying  robustness 
are  located  and  embedded  in  a  hierarchical  structure  called  a  dendrogram  [123],  as  shown 
in  Figure  11.1(b).  The  acoustic  segments  in  the  dendrogram  are  then  mapped  to  phoneme 
hypotheses,  using  a  set  of  automatically  determined  acoustic  parameters  in  conjunction  with 
conventional  pattern  recognition  algorithms  [264].  The  result  is  a  phoneme  network,  in  which 
each  arc  is  characterized  by  a  vector  of  probabilities  for  all  the  possible  candidates,  as  shown 
in  Figure  1 1.1(c). 

Words  in  the  lexicon  are  represented  as  pronunciation  networks,  which  are  generated  auto¬ 
matically  by  a  .set  of  phonological  rules.  This  is  illustrated  in  Figure  11.1(d)  for  the  word 
"hospital.”  Probabilities  derived  from  training  data  are  assigned  to  each  arc  to  reflect  the 
likelihood  of  a  particular  pronunciation.  Presently,  lexical  decoding  is  accomplished  by  using 
the  Viterbi  algonthm  to  find  the  best  path  that  matches  the  acoustic-phonetic  network  with 
the  lexical  network.  The  recognized  word  string  is  shown  in  Figure  11.1(e). 
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Figure  11.2;  Rank  order  statistics  for  the  current  phone  classifier  on  a  speaker-independent 
task.  There  are  38  context-independent  phone  labels:  14  vowels,  3  semivowels,  3  nasals,  8 
fricatives,  2  affricates,  6  stops,  1  flap,  and  one  for  silence. _ 

We  recently  evaluated  SUMMIT’S  performance  in  a  number  of  ways.  Phonetic  classification 
performance  was  evaluated  by  comparing  the  labels  provided  by  the  classifier  to  those  in  a 
time-aligned  transcription,  using  38  context-independent  phone  labels  [318].  This  particular 
set  was  selected  because  it  has  been  used  in  other  recent  evaluations  within  the  DARPA 
community.  For  a  single  speaker,  the  top-choice  classification  accuracy  was  77%.  The  correct 
label  is  within  the  top  three  nearly  95%  of  the  time.  For  multiple  and  unknown  speakers, 
the  top-choice  accuracy  is  about  70%,  and  the  correct  choice  is  within  the  top  three  over 
90%  of  the  time.  Figure  11.2  shows  the  rank  order  statistics  for  both  the  speaker-dependent 
and  speaker-independent  cases. 

Word  accuracy  for  the  SUMMIT  system  was  evaluated  during  February  on  the  DARPA 
1000-word  Resource  Management  task  [319].  Two  different  speaker-independent  test  sets 
provided  by  NIST,  consisting  of  150  and  300  sentences,  respectively,  were  used  [252].  The 
SUMMIT  system  achieved  a  word  accuracy  of  87%  on  both  test  sets,  using  the  designated 
word-pair  grammar  with  perplexity  of  60,  and  approximately  70  context-independent  phone 
models.  SUMMIT’S  performance  compares  favorably  with  systems  that  are  based  on  hidden 
Markov  modeling,  when  evaluated  on  the  same  data  and  using  a  similar  number  of  phone 
models  [200].  Since  other  researchers  have  been  able  to  improve  their  system’s  performance 
by  increasing  the  number  of  models  to  accommodate  context-dependency,  we  expect  that 
we  can  similarly  improve  SUMMIT’S  performance. 

Currently  the  SUMMIT  system  is  implemented  on  a  Symbolics  Lisj)  Machine  augmented 
with  an  FPS  Array  processor,  and  runs  in  several  hundred  times  real  time.  Over  the  next 
year,  we  will  begin  to  port  the  system  to  a  faster  platform  in  conjunction  witli  developed 
dedicated  hardware  to  achieve  near  real  time  performance. 
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11.2.2  Natural  Language  Processing:  The  TINA  System 

A  new  natural  language  system,  TINA,  has  been  developed  in  our  group  [284]  which  inte¬ 
grates  key  ideas  from  context  free  grammars,  Augmented  Transition  Networks  ( ATN’s)  [311], 
and  Lexical  Functional  Grammars  (LFG’s)  [66].  TINA  is  specifically  designed  to  accommo¬ 
date  full  integration  between  speech  recognition  and  natural  language  processing,  and  has  a 
set  of  features  reflecting  this  philosophy. 

The  grammar  begins  with  a  set  of  context-free  rewrite  rules,  which  are  augmented  with 
parameters  to  enforce  syntactic  and  semantic  constraints.  These  rules  are  converted  auto¬ 
matically  to  a  network  form,  leading  to  extensive  structure  sharing.  All  arcs  in  the  network 
have  associated  probabilities,  which  can  be  trained  automatically  from  a  set  of  parsed  sen¬ 
tences.  The  parser  uses  a  best-first  search  strategy.  Control  includes  both  top-down  and 
bottom-up  cycles,  and  key  parameters  are  passed  among  nodes  to  deal  with  long-distance 
movement  and  agreement  constraints.  The  probabilities  provide  a  natural  mechanism  for 
exploring  more  common  grammatical  constructions  first.  TINA  also  includes  a  new  strategy 
for  dealing  with  movement,  which  can  handle  efficiently  nested  and  chained  gaps,  and  rejects 
crossed  gaps. 

Over  the  past  few  months,  TINA  has  been  ported  to  the  DARPA  1000-word  Resource 
Management  task.  We  used  the  791  designated  training  sentences  and  200  (unseen)  test 
sentences  to  evaluate  our  parser  for  coverage  and  perplexity.  The  training  was  a  two-step 
process.  We  first  expanded  the  coverage  of  the  grammar  until  it  could  handle  all  of  the  791 
training  sentences  (100%  coverage).  We  then  built  a  new  subgrammar  from  these  sentences, 
with  probabilities  on  arcs  updated  according  to  their  usage  within  the  training  set  (any  rules 
that  only  appeared  in  the  TIMIT  domain  were  automatically  discarded).  This  resulted  in 
a  grammar  that  was  tightly  defined  for  the  RM  task.  We  then  tested  this  grammar  for 
coverage  and  perplexity  on  the  200  test  sentences.  The  results  were  that  84%  of  the  test 
sentences  were  parsable,  and  the  perplexity  was  368  if  all  words  that  could  follow  each  word 
were  considered  to  be  equally  likely.  The  surprising  result  was  that  the  perplexity  dropped 
9- fold  when  arc  probabilities  were  incorporated  into  the  measurement,  down  to  41.5.  We  also 
looked  at  the  parses  to  establish  the  depth  from  the  top  of  the  correct  parse.  We  found  that 
88%  of  the  training  sentences  gave  a  correct  parse  as  the  first  choice;  this  number  increased 
to  90%  for  the  test  sentences.  Both  sets  gave  the  correct  parse  within  the  top  three  98%  of 
the  time. 

11.2.3  Spoken  Language  Understanding:  The  VOYAGER  System 

Over  the  past  three  months,  we  initiated  an  effort  in  spoken  language  understanding.  The 
projeci  is  motivated  by  our  belief  that  many  of  the  applications  suitable  for  human/machine 
interaction  using  speech  typically  involve  interactive  problem  solving.  That  is,  in  addition 
to  converting  the  speech  signal  to  text,  the  computer  must  also  understand  the  linguistic 
structure  of  a  sentence  in  order  to  generate  the  correct  response. 

In  order  to  explore  issues  related  to  a  fully-interactive  spoken  language  system,  we  selected 
a  task  in  which  the  system  knows  about  the  physical  environment  of  a  specific  geographical 
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area,  and  can  provide  assistance  on  how  to  get  from  one  location  to  another  within  this  area. 
The  system,  which  we  call  Voyager,  can  also  provide  information  concerning  certain  objects 
located  inside  this  area.  The  current  version  of  Voyager  focuses  on  the  geographic  area  of 
the  city  of  Cambridge  between  MIT  and  Harvard  University,  and  can  answer  a  number  of 
different  types  of  questions  about  certain  hotels,  restaurants,  hospitals,  and  other  objects 
within  this  region. 

Voyager  is  made  up  of  three  components.  The  first  component,  SUMMIT,  converts  the 
speech  signal  into  a  set  of  word  hypotheses.  The  natural  language  component,  TINA,  then 
provides  a  linguistic  interpretation  of  the  set  of  words.  The  parse  generated  by  the  natural 
language  component  is  then  transformed  into  a  set  of  query  functions,  which  are  passed 
to  the  backend  for  response  generation.  The  backend  is  an  enhanced  version  of  the  di¬ 
rection  assistance  program  developed  by  Jim  Davis  of  the  Media  Laboratory  at  MIT.  The 
response  generator  maintains  some  knowledge  about  recent  discourse  history,  which  allows 
it  to  respond  appropriately  to  queries  such  as  “How  do  I  get  there?”  Currently,  Voyager  can 
generate  responses  in  the  form  of  text,  graphics,  and  synthetic  speech. 

As  of  now.  Voyager  has  a  vocabulary  of  approximately  400  words,  and  it  can  deal  with  about 
half  a  dozen  types  of  queries,  such  as  the  location  of  objects,  simple  properties  of  objects, 
how  to  get  from  one  place  to  another,  and  the  distance  and  time  for  travel  between  objects. 
Within  this  limited  domain  of  knowledge,  it  is  our  hope  that  Voyager  will  be  able  to  handle 
any  reasonable  query  that  a  native  speaker  is  likely  to  initiate.  As  time  progresses.  Voyager’s 
knowledge  base  will  undoubtedly  grow. 

11,2.4  Isolated  Word  Recognition  over  Telephone  Networks 

Over  the  past  few  months,  we  initiated  an  effort  to  develop  a  small-vocabulary,  isolated- 
word  recognition  system.  The  focus  of  this  research  is  to  explore  how  our  phonetically-  and 
segmentally-based  approach  will  fare  with  the  bandhmited  and  distorted  speech  transmitted 
through  local  and  long  distance  telephone  networks,  spoken  by  real  users. 

As  a  first  step,  we  selected  the  task  of  recognizing  a  small  set  of  city  names.  We  have  imple¬ 
mented  such  a  system,  and  have  begun  some  preliminary  evaluations,  using  data  collected 
by  NYNEX  Corporation.  We  are  also  using  this  simple  task  as  a  framework  in  which  to 
explore  the  use  of  unsupervised  learning  techniques  to  enable  the  automatic  expansion  of 
the  vocabulary. 

11.3  Student  Reports 

Nancy  Daly 

During  the  spring  semester,  Daly  spent  most  of  her  time  working  as  a  teaching  assi.stant  tor 
a  new  course  on  automatic  speech  recognition  introduced  by  Victor  Zuc.  Over  the  next  few 
months,  she  plans  to  take  her  area  exam  and  work  on  her  doctoral  thesis,  which  is  in  the 
area  of  prosodic  aids  for  speech  recognition. 
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Prosody  is  the  stress,  rhythm,  and  intonation  of  speech.  While  the  importance  of  prosodic 
information  has  long  been  documented  for  human  speech  communication,  automatic  speech 
recognition  systems  developed  as  of  now  have  all  but  ignored  this  source  of  information.  The 
purpose  of  her  thesis  research  is  to  see  how  prosodic  information  could  be  incorporated  into 
speech  recognition  systems  to  improve  their  performance. 

One  of  the  first  projects  is  to  determine  what  form  of  prosodic  information  ran  be  reliably 
extracted  from  the  speech  signal  in  the  absence  of  any  segmental  information.  Specifically, 
she  is  investigating  whether  stressed  syllables  can  be  identified  reliably  by  native  listeners 
when  the  phoneme  identity  has  been  removed  from  the  speech  signal  through  inverse  filtering. 
This  line  of  investigation  can  lead  to  the  determination  of  the  stress  pattern  of  words,  and 
the  use  of  this  knowledge  to  aid  phonetic  recognition  and  lexical  access. 

Rob  Kassel 

Kassel  is  pursuing  a  Master’s  thesis  on  the  use  of  distinctive  features  for  lexical  access. 
Distinctive  features  have  been  proposed  by  many  as  a  sub-phoneme  linguistic  unit.  Feature 
spreading  can  concisely  represent  the  allophonic  variation  found  in  spoken  language,  an 
attractive  property  for  speech  recognition  systems.  He  began  a  study  to  determine  the 
expressive  power  of  distinctive  feature®  in  terms  of  information  theoretical  measures. 

Hong  Leung 

Leung  just  completed  his  Ph.D.  thesis  entitled  “The  Use  of  Artificial  Neural  Networks  for 
Phonetic  Recognition.”  One  of  the  major  problems  with  current  speech  recognition  systems 
is  that  the  system’s  self-organizing  framework  is  very  powerful  but  too  rigid  for  incorporating 
more  human  knowledge  about  speech,  or  that  there  is  a  significant  amount  of  human  knowl¬ 
edge  in  the  system  but  the  control  strategy  is  not  powerful  enough.  Due  to  their  flexible 
self-organizing  framework,  artificial  neural  networks  (ANN’s)  can  potentially  bridge  the  gap 
between  our  knowledge  in  speech  and  powerful  self-organizing  mechanisms.  Leung’s  thesis 
is  concerned  with  the  use  of  ANN’s  for  phonetic  recognition.  There  are  three  major  objec¬ 
tives.  First,  by  investigating  ANN’s  in  order  to  gain  a  better  understanding  of  their  basic 
characteristics  and  capabilities,  we  may  be  able  to  exploit  them  more  fully  as  pattern  clas¬ 
sifiers.  Secondly,  by  properly  applying  our  acoustic-phonetic  knowledge,  we  can  potentially 
enhance  the  flexible  framework  of  ANN’s  for  phonetic  recognition.  Thirdly,  by  comparing 
them  with  traditional  pattern  classification  techniques,  we  can  better  understand  the  merits 
and  shortcomings  of  the  different  approaches. 

The  niultidayer  perceptron  (MLP)  was  selected  for  his  investigation,  which  centered  around 
a  set  of  vowel  recognition  experiments.  In  order  to  isolate  different  sources  of  variability  in 
the  speech  signal,  four  different  databases  were  used  for  oui  study.  The  largest  database 
consists  of  22,000  vowel  tokens  extracted  from  continuous  sentences  in  the  TIMIT  database, 
.s[)oken  by  .o50  male  and  female  speakers.  The  performance  of  the  network  was  evaluated 
in  several  ways.  Evaluation  in  terms  of  average  agreement  with  the  phonetic  transcription 
suggests  that  the  performance  of  the  network  compares  favorably  to  human  performance  in 
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perceptual  experiments.  Evaluation  along  the  phonological  dimension  suggests  that  most  of 
the  confusions  between  the  network  and  transcription  labels  are  quite  reasonable. 

Next,  the  characteristics  and  representations  of  the  MLP  were  explored.  Specifically,  he 
examined  the  performance  of  the  network  as  a  function  of  the  number  of  training  iterations, 
amount  of  training  data,  number  of  hidden  units,  number  of  hidden  layers,  and  use  of  the 
nonlinear  sigmoid  function.  He  also  discussed  the  structure  and  self-organization  of  the 
internal  representations,  choices  for  output  representations,  and  the  use  of  heterogeneous 
input  representations.  Other  issues  discussed  include  error  metrics  for  training  the  network, 
initializations  of  the  network,  and  rapid  adaptation  of  the  network  to  a  new  speaker. 

Finally,  the  performance  of  the  network  was  compared  with  that  of  two  traditional  clas¬ 
sification  techniques.  For  the  vowel  classification  task,  experiments  demonstrate  that  the 
MLP  can  yield  higher  performance  than  k-nearest  neighbor  and  Gaussian  classifiers.  The 
results  suggest  that  the  MLP  can  provide  an  effective  alternative  for  pattern  classification, 
especially  if  the  classification  problem  is  not  well  understood. 

Jeffrey  Marcus 

Marcus  has  been  working  on  incorporating  speech  units  of  different  sizes  (e.g.,  phoneme, 
diphone,  word)  in  a  speech  recognition  system.  In  addition,  he  is  considering  schemes  for 
sharing  information  among  speech  units  which  have  certain  phonetic  similarities  so  that 
parameter  estimates  for  these  models  are  improved.  Another  major  goal  of  his  work  is  to 
advance  recognizer  design  methodology  by  demonstrating  the  utility  of  statistical  and  data 
analytic  techniques  which  have  not  been  applied  previously. 

The  work  is  currently  focused  on  modeling  function  words  such  as  “the”  and  “and,”  since 
they  vary  greatly  acousticcdly  and  cause  a  disproportionate  number  of  recognizer  errors.  In 
the  future,  these  techniques  will  be  extended  to  other  lexical  and  phonetic  units. 

Helen  Meng 

Meng  joined  the  group  in  January,  and  spent  the  past  semester  finishing  her  Bachelor’s  thesis, 
and  building  up  background  in  speech  through  taking  the  courses  6.979,  Automatic  Speech 
Recognition  and  6.541J,  Speech  Communication.  In  addition,  she  attended  spectrogram 
reading  sessions  run  in  the  Spoken  Language  Systems  Group.  A  term  paper  was  also  written 
under  the  topic  of  “An  Acoustic  Study  of  the  Semi-Vowel  /!/.”  The  paper  reports  a  study  of 
prevocalic,  intervocalic  and  postvocalic  /l/’s  in  some  data  collected  by  Dennis  Klatt.  Over 
the  next  few  months,  she  will  be  familiarizing  herself  with  the  computational  facilities  in  the 
group,  as  well  as  sea^'ching  for  a  topic  for  her  Master’s  thesis. 

John  F.  Pitrelli 

Pitrelli  has  been  studying  phoneme  durations  in  order  to  develop  a  duration  model  to  aid 
speech  recognition.  Duration  is  potentially  a  strong  cue  for  certain  phonemic  distinctions. 
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including  inherently  long  vs.  short  vowels,  and  voiced  vs.  unvoiced  obstruent  consonants. 
Phoneme  durations  are  affected,  though,  by  an  abundance  of  factors  ranging  from  detailed 
phonetic  context  effects  to  syntax  and  semantics.  Our  lack  of  understanding  of  these  effects 
and  their  interactions  hinders  our  use  of  potentially  useful  duration  information  to  the  extent 
that  most  speech  recognition  systems  currently  use  only  rudimentary  duration  models  or  use 
time-warping  procedures,  which  distort  duration  information. 

Recent  research  has  focused  on  two  facets  of  the  duration  modeling  problem.  One  is  the 
completion  and  evaluation  of  a  hierarchical  model  accounting  for  discrete- valued  factor  vari¬ 
ables,  such  as  phonetic  context  and  syntactic-unit-final  lengthening.  The  other  task  has 
been  a  preliminary  exploration  of  the  effects  of  speaking-rate  variations  on  phoneme  du¬ 
ration.  Future  work  includes  the  continuation  of  the  speaking-rate  experiments,  with  the 
goal  of  improving  understanding  of  rate  effects  on  duration,  both  gradual,  such  as  vowel 
compression,  and  abrupt,  such  as  flapping  of  alveolar  stops.  Following  these  xneriments, 
the  hierarchical  duration  model  will  be  augmented  by  the  incorporation  of  an  ap^  -^  priate 
function  of  speaking  rate. 

Dimitry  Rtischev 

Rtischev  joined  the  group  in  January.  Over  the  past  five  months,  he  worked  on  an  inter¬ 
active  software  facility  for  simulation  of  hidden  Markov  models.  The  completed  program, 
named  HIMARK,  provides  a  flexible  experimental  environment  for  constructing,  training, 
arid  observing  hidden  Markov  models  and  using  them  for  various  speech  recognition  tasks. 
HIMARK  formed  the  basis  for  two  lab  assignments  which  he  prepared  for  6.979,  Automatic 
Speech  R.erngnition.  Dimitry’s  plans  for  the  next  year  include  research  in  applying  statisti¬ 
cal  methods  such  as  HMM  for  speech  synthesis  and  preparing  for  the  Preliminary  Written 
Examination  and  Oral  Exam. 

Michal  Soclof 

Soclof  joined  the  group  in  January,  and  spent  the  spring  semester  learning  about  automatic 
speech  recognition  by  taking  the  speech  recognition  and  spectrogram  reading  courses,  and  by 
reading  relevant  material.  Tn  addition,  she  learned  about  the  SUMMIT  system  and  became 
familiar  with  the  computational  facilities  in  the  group.  During  the  upcoming  months,  she 
will  be  w'orking  on  her  Master’s  thesis  research.  A  potential  topic  which  she  is  investigating 
is  the  problem  of  detecting  speech  in  the  presence  of  other  vocalizations.  This  entails  being 
able  to  distinguish  between  a  speech  event  and  a  non-speech  event  such  as  throat  clearing  or 
coughing.  She  will  be  studying  what  makes  the  two  events  different  and  possible  methods 
for  distinguishing  them. 

Sean  Trowbridge 

Trowbridge  joined  the  group  in  January,  and  spent  the  spring  term  mainly  getting  oriented, 
learning  about  speech  recognition  in  general,  and  spectrogram  reading  in  particular.  He  also 
helped  out  with  some  grading  for  6.979,  and  started  preliminary  research  for  his  Master’s 
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thesis.  In  the  coming  year,  he  will  be  working  with  Steve  Ward  on  a  thesis  that  involves  the 
NuMesh  computer  and  its  application  to  speech  recognition.  He  will  be  designing  something 
resembling  a  compiler  for  the  machine,  which  will  take  a  computation  specification,  along 
with  some  resource  constraints,  and  produce  a  topology  and  timing  for  each  processor  that 
will  perform  the  computation  specified. 
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MIT  Department  of  Electrical  Engineering  and  Computer  Science,  May  1989.  Super¬ 
vised  by  V.W.  Zue. 

Theses  in  Progress 

[1]  N.  Daly.  Prosodic  Aids  to  Speech  Recognition.  PhD  thesis,  MIT  Department  of  Electri¬ 
cal  Engineering  and  Computer  Science,  expected  1991.  Supervised  by  V.W.  Zue. 

[2]  R.  Kassel.  The  Information  Content  of  Distinctive  Features  in  American  English.  Mas¬ 
ter’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  expected 
1989.  Supervised  by  V.W.  Zue. 

[3]  J.  Marcus.  Incorporation  of  Different  Sized  Units  in  a  Speech  Recognition  System.  PhD 
thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  expected  1991. 
Supervised  by  V.W.  Zue. 

’4]  .1.  Pitrelli.  Duration  Models  for  Continuous  Speech  Recognition.  PhD  thesis,  MIT 
Department  of  Electrical  Engineering  and  Computer  Science,  expected  December  1989. 
Supervised  by  V.W.  Zue. 

Talks 

[l!  V.W.  Zue.  Speech  recognition  at  MIT.  Lecture  given  at  the  External  Research  Sympo¬ 
sium,  Apple  C’omputer,  Inc.  Cupertino,  CA,  January  1989. 

[2j  V  W.  Zue.  Speech  recognition  at  MIT.  Lecture  given  at  Siemens  A.G.,  Munich,  West 
Germany,  Eebruary  1989. 

[3]  S.  Seneff.  TINA:  a  probabilistic  syntactic  parser  for  speech  understanding  systems. 
Lecture  given  at  AAAI  Workshop  on  Spoken  Language  Systems,  Stanford  University, 
March  1989. 
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12.1  Introduction 

The  Systematic  Program  Development,  though  small,  has  a  diverse  set  of  interests.  These 
include  programming  methodology,  programming  multiprocessors,  specification  languages 
(in  conjunction  with  researchers  at  the  Digital  Equipment  Corporation),  circuit  verification 
(in  conjunction  with  researchers  at  the  Technical  University  of  Denmark),  automatic  theorem 
proving,  and  high  performance  garbage  collection. 

12.2  The  Larch  Family  of  Specification  Languages 

The  Larch  family  of  specification  languages  supports  a  two-tiered  definitional  approach  to 
specification.  Each  specification  has  components  written  in  two  languages:  one  designed  for 
a  specific  programming  language  and  another  independent  of  any  programming  language. 
The  former  are  called  Larch  interface  languages,  and  the  latter  the  Larch  Shared  Language 
{LSL). 

Larch  interface  languages  are  used  to  specify  the  interfaces  between  program  components. 
Each  specification  provides  the  information  needed  to  use  the  interface  and  to  write  programs 
that  implement  it.  A  critical  part  of  each  interface  is  how  the  component  communicates 
with  its  environment.  Communication  mechanisms  differ  from  programming  language  to 
programming  language,  sometimes  in  subtle  ways.  We  have  found  it  easier  to  be  precise 
about  communication  when  the  interface  specification  language  reflects  the  programming 
language.  Specifications  written  in  such  interface  languages  are  generally  shorter  than  those 
written  in  a  “universal”  interface  language.  They  are  also  clearer  to  programmers  who 
implement  components  and  to  programmers  who  use  them. 

Each  Larch  interface  language  deals  with  what  can  be  observed  about  the  behavior  of  compo¬ 
nents  written  in  a  particular  programming  language.  It  incoi^/orates  programming-language- 
specific  notations  for  features  such  as  side  effects,  exception  handling,  iterators,  and  concur¬ 
rency.  Its  simplicity  or  complexity  depends  largely  upon  the  simplicity  or  complexity  of 
the  observable  state  and  state  transformations  of  its  programming  language.  Figure  12.1 
contains  a  sample  interface  specification  for  a  CLU  procedure  in  a  window  system. 

Larch  Shared  Language  specifications  are  used  to  provide  a  semantics  for  the  primitive  terms 
used  in  interface  specifications.  Specifiers  are  not  limited  to  a  fixed  set  of  primitive  terms,  but 


addWindow  ~  proc  {v  :  View,w  :  Window,  c  :  Coord)  signals  {duplicate) 
modifies  v 

ensures  v'  =  addW(v,w,c) 

except  when  w  G  v  signals  duplicate  ensures  v'  ~  v 


Figure  12.1:  Sample  Larch/CLU  Interface  Specification 
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can  use  LSL  to  define  specialized  vocabularies  suitable  for  particular  interface  specifications. 
For  example,  an  LSL  specification  would  be  used  to  define  the  meaning  of  the  symbols  6 
and  addW  in  Figure  12.1,  thereby  precisely  answering  questions  such  as  what  it  means  for 
a  window  to  be  in  a  view  (visible  or  possibly  obscured?),  or  what  it  means  to  add  a  window' 
to  a  view  that  may  contain  other  windows  at  the  same  location. 

The  Larch  approach  encourages  specifiers  to  keep  most  of  the  complexity  of  specifications  in 
the  LSL  tier  for  several  reasons; 


•  LSL  abstractions  are  more  likely  to  be  re-usable  than  interface  specifications. 

•  LSL  has  a  simpler  underlying  semantics  than  most  programming  languages  (and  hence 
than  most  interface  languages),  so  that  specifiers  are  less  likely  to  make  mistakes. 

•  It  is  easier  to  make  and  check  claims  about  semantic  properties  of  LSL  specifications 
than  about  semantic  properties  of  interface  specifications. 


12.3  The  LP  Theorem  Proving  System 

LP  has  changed  dramatically  in  the  last  year.  We  take  this  opportunity  to  present  a  fairly 
detailed  overview  of  its  current  capabilities. 

The  basis  for  proofs  in  LP  is  a  logical  system  consisting  of  equations,  rewrite  rules,  operator 
theories,  induction  rules,  and  deduction  rules,  all  expressed  in  a  multisorted  fragment  of 
first-order  logic.  A  logical  system  in  LP  is  closely  related  to  an  LSL  theory,  but  is  handled 
in  somewhat  different  ways,  both  because  axioms  in  LP  have  operational  content  as  well  as 
semantic  content  and  because  they  can  be  presented  to  LP  incrementally,  rather  than  aU  at 
once. 

12.3.1  Declarations 

Sorts,  operators,  and  variables  play  exactly  the  same  roles  in  LP  as  they  do  in  LSL.  They 
must  be  declared,  and  operators  can  be  overloaded.  The  syntax  for  operators  at  the  moment 
is  not  as  rich  as  in  LSL,  but  we  plan  to  rectify  that.  Unlike  LSL,  LP  at  present  provides  no 
scoping  for  variables. 

12.3.2  Equations  and  Rewrite  Rules 

LP  is  based  on  a  fragment  of  first-order  logic  in  w'hich  equations  play  a  prominent  role. 
Some  of  LP’s  inference  mechanisms  w-ork  directly  with  equations.  Most,  however,  require 
that  equations  be  oriented  into  rewrite  rules,  w’hich  LP  uses  to  reduce  terms  to  normal  forms. 
It  is  usually  essential  that  the  rewriting  relation  be  terminating,  i.e.,  that  no  term  can  be 
rewritten  infinitely  many  times.  LP  provides  several  mechanisms  that  automatically  orient 
many  sets  of  equations  into  terminating  rewriting  systems.  For  example,  in  response  to  the 
commands 
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declare  sort  G 

declare  variables  x,y,z:  G 

declare  operators  e:  —*  G^i\  G  G,  G  —♦  G 

assert 

{x  *  y)  *  z  -=  X  *  [y  *  z) 
e  ==  X  *  i{x) 
e  *  X  ==  X 


that  enter  the  usual  first-order  axioms  for  groups,  LP  produces  the  rewrite  rules 

{x*y)*z-^  x*{y*z) 

X  *  i{x)  — >  e 

e  *  X  —*  X. 

It  automatically  reverses  the  second  equation  to  prevent  nonterminating  rewriting  sequences 
such  as  e  — ►  e  *  i(e)  — »  i{e)  —*  i{e  *  i{e))  —*  i{i{e))  — »  . . .  The  discussion  of  operator  theories, 
below,  treats  the  issue  of  termination  further. 

A  system’s  rewriting  theory  (i.e.,  the  propositions  that  can  be  proved  by  reduction  to  normal 
form)  is  always  a  subset  of  its  equational  theory  (i.e.,  the  propositions  that  follow  logically 
from  its  equations  and  from  its  rewrite  rules  considered  as  equations).  The  proof  mechanisms 
discussed  below  compensate  for  the  incompleteness  that  results  when,  as  is  usually  the  case, 
a  system’s  rewriting  theory  does  not  include  all  of  its  equational  theory.  In  the  case  of  group 
theory,  for  example,  the  equation  e  ==  t(e)  follows  logically  from  the  second  and  third 
axioms,  but  is  not  in  the  rewriting  theory  of  the  three  rewrite  rules  (because  it  is  irreducible 
and  yet  is  not  an  identity). 

LP  provides  builtin  rewrite  rules  to  simplify  terms  involving  the  Boolean  operators  &  , 
I,  and  <^,  the  equality  operator  =,  and  the  conditional  operator  if.  These  rewrite  rules 
are  sufficient  to  prove  many,  but  not  all,  identities  involving  these  operators.  Unfortunately, 
the  sets  of  rewrite  rules  that  are  known  to  be  complete  for  propositional  calculus  require 
exponential  time  and  space.  Furthermore,  they  can  expand,  rather  than  simplify,  proposi¬ 
tions  that  do  not  reduce  to  identities.  These  are  serious  drawbacks,  because  when  we  are 
debugging  specifications  we  often  attempt  to  prove  conjectures  that  are  not  true.  So  none 
of  the  complete  sets  of  rewrite  rules  is  built  into  LP.  Instead,  LP  provides  proof  mechanisms 
that  can  be  used  to  overcome  incompleteness  in  a  rewriting  system,  and  it  allows  users  to 
add  any  of  the  complete  sets  they  choose  to  use. 

LP  treats  the  equations  tn/e  ==  false  and  x  ==  t,  where  t  is  a  term  not  containing  the 
variable  x,  as  inconsistent.  Inconsistencies  can  be  used  to  establish  subgoals  in  proofs  by 
cases  and  contradiction.  If  they  arise  in  other  situations,  they  indicate  that  the  axioms  in 
the  logical  system  are  inconsistent. 
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12.3.3  Operator  Theories 

LP  provides  special  mechanisms  for  handling  equations  such  as  a;  +  y  ==  y  +  x  that  cannot 
be  oriented  into  terminating  rewrite  rules.  The  LP  command  assert  ac  +  says  that  +  is 
associative  and  commutative.  Logically,  this  assertion  is  merely  an  abbreviation  for  two 
equations.  Operationally,  LP  uses  it  to  match  and  unify  terms  modulo  associativity  and 
commutativity.  This  not  only  increases  the  number  of  theories  that  LP  can  reason  about, 
but  also  reduces  the  number  of  axioms  required  to  describe  various  theories,  the  number  of 
reductions  necessary  to  derive  identities,  and  the  need  for  certain  kinds  of  user  interaction, 
e.g.,  case  analysis.  The  main  drawback  of  term  rewriting  modulo  operator  theories  is  that 
it  can  be  much  slower  than  conventional  term  rewriting. 

LP  recognizes  two  nonempty  operator  theories:  the  associative-commutative  theory  and  the 
commutative  theory.  It  contains  a  mechanism  (based  on  user-supplied  polynomial  inter¬ 
pretations  of  operators)  for  ordering  equations  that  contain  commutative  and  associative- 
commutative  operators  into  terminating  systems  of  rewrite  rules.  But  this  mechanism  is 
difficult  to  use,  and  most  users  rely  on  simpler  ordering  methods  based  on  LP-suggested 
partial  orderings  of  operators.  These  simpler  ordering  methods  do  not  guarantee  termina¬ 
tion  when  equations  contain  commutative  or  associative-commutative  operators,  but  they 
work  well  in  practice.  Like  manual  ordering  methods,  which  give  users  complete  control  over 
whether  equations  are  ordered  from  left  to  right  or  from  right  to  left,  they  are  easy  to  use. 
In  striking  contrast  to  manual  ordering  methods,  they  have  not  yet  caused  difficulties  by 
producing  a  nonterminating  set  of  rewrite  rules. 

12.3.4  Induction  Rules 

LP  uses  induction  rules  to  generate  subgoals  to  be  proved  for  the  basis  and  induction  steps 
in  proofs  by  induction.  The  syutcLx  for  induction  rules  is  the  same  in  LP  as  in  LSL.^  Users 
can  specify  multiple  induction  rules  for  a  single  sort,  e.g.,  by  the  LP  commands 

declare  sorts  E,S 
declare  operators 

{}:  -5 
{_}:  E-.S 
__  U  5,  5  5 

insert:  S,E  — >  S 

set  name  sctlnductionl 

assert  S  generated  by  {},  insert 

set  name  setInduction2 

assert  S  generated  by  {},  {— },  U 

and  can  use  the  appropriate  rule  when  attempting  to  prove  an  equation  by  induction;  e.g., 

'The  semantics  of  induction  is  stronger  in  LSL  than  in  LP,  where  arbitrary  first-order  formulas  cannot 
be  written. 


147 


Systematic  Program  Development 

prove  I  C  (x  U  y)  by  induction  on  x  using  setInduction2 

In  LSL,  the  axioms  of  a  trait  typically  have  only  one  generated  by  for  a  sort.  It  is  often 
useful,  however,  to  put  others  in  the  trait’s  implications. 

12.3.5  Deduction  Rules 

LP  subsumes  the  logical  power  of  the  partitioned  by  construct  of  LSL  by  allowing  users  to 
assert  deduction  rules,  which  LP  uses  to  deduce  equations  from  other  equations  and  rewrite 
rules.  In  general,  a  partitioned  by  is  equiveilent  to  a  universal  existential  axiom,  which  can 
be  expressed  as  a  deduction  rule  in  LP.  For  example,  the  LP  commands 


declare  sorts  E,S 

declare  operator  G:  E,S  —*  Bool 

declare  variables  e;  E,  x,y:  S 

assert  when  (forall  e)  e  G  a:  ==  e  G  y  yield  x  ==  y 


define  a  deduction  rule  equivcdent  to  the  axiom 

(Vx,  y  ;  S)  [(Ve  :  E)(e  Gx^eEy)=^x  =  y] 

of  set  extensionality,  which  can  also  be  expressed  by  assert  S  partitioned  by  G  in  LP,  as  in 
LSL.  This  deduction  rule  enabi'-s  LP  to  deduce  equations  such  as  x  ==  x  U  x  automatically 
from  equations  such  as  e  G  i  ==  e  G  (x  U  x). 

Deduction  rules  also  serve  to  improve  the  performance  of  LP  and  to  reduce  the  need  for  user 
interaction.  Examples  of  such  deduction  rules  are  the  builtin  ^-splitting  law 


declare  variables  p,  q:  Bool 

when  p  &  y  ==  true  yield  p  ==  true,  q  ==  true 


and  the  cancellation  law  for  addition 


declare  variables  x,y,z:  Nat 
when  X  -(-  y  ==  x  +  2  yield  y  ==  z 

LP  automatically  applies  deduction  rules  to  equations  and  rewrite  rules  whenever  they  are 
normalized. 
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12.3.6  Proof  Mechanisms  in  LP 

This  section  provides  a  brief  overview  of  the  proof  mechanisms  in  LP. 

LP  provides  mechanisms  for  proving  theorems  using  both  forward  and  backward  inference. 
Forward  inferences  produce  consequences  from  a  logical  system;  backward  inferences  produce 
lemmas  whose  proof  will  suffice  to  establish  a  conjecture.  There  are  four  methods  of  forward 
inference  in  LP. 


»  Automatic  normalization  produces  new  consequences  when  a  rewrite  rule  is  added  to 
a  system.  LP  keeps  rewrite  rules,  equations,  and  deduction  rules  in  normal  form.  If  an 
equation  or  rewrite  rule  normalizes  to  an  identity,  it  is  discarded.  If  the  hypothesis  of  a 
deduction  rule  normalizes  to  an  identity,  the  deduction  rule  is  replaced  by  the  equations 
in  its  conclusions.  Users  can  “immunize”  equations,  rewrite  rules,  and  deduction  rules 
to  protect  them  from  automatic  normalization,  both  to  enhance  the  performance  of 
LP  and  to  preserve  a  particular  form  for  use  in  a  proof.  Users  can  also  “deactivate” 
rewrite  rules  and  deduction  rules  to  prevent  them  from  being  automatically  applied. 

•  .4utomatic  application  of  deduction  rules  produces  new  consequences  after  equations 
and  rewrite  rules  in  a  system  are  normalized.  Deduction  rules  can  also  be  applied 
explicitly,  e.g.,  to  immune  equations. 

•  The  computation  of  critical  pairs  and  the  Knuth-Bendix  completion  procedure  produce 
consequences  (such  as  z(e)  ==  e)  from  incomplete  rewriting  systems  (such  as  the  three 
rewrite  rules  for  groups).  We  rarely  complete  our  rewriting  systems.  However,  we  often 
make  selective  use  of  critical  pairs.  We  also  use  the  completion  procedure  to  look  for 
inconsistencies. 

•  Explicit  instantiation  of  variables  in  equations,  rewrite  rules,  and  deduction  rules  also 
produces  consequences.  For  example,  in  a  system  that  contains  the  rewrite  rules 
a  <  (6  +  c)  true  and  (6  +  c)  <  d  — +  time,  instantiating  the  deduction  rule 

when  X  <  y  —=  true,  y  <  z  ==  true  yield  x  <  z  ==  true 

with  a  for  x,  b  +  c  for  y,  and  d  for  z  produces  a  deduction  rule  whose  hypotheses 
normalize  to  identities,  thereby  yielding  the  conclusion  a  <  d  true. 


There  are  also  six  methods  of  backward  inference  for  proving  equations  in  LP.  These  methods 
are  invoked  by  the  prove  command.  In  each  method,  LP  generates  a  set  of  subgoals,  i.e., 
lemmas  to  be  proved  that  together  are  sufficient  to  imply  the  conjecture.  For  some  methods, 
it  also  generates  additional  axioms  that  may  be  used  to  prove  particular  subgoals. 


Normalization  rewrites  conjectures.  If  a  conjecture  normalizes  to  an  identity,  it  is  a 
theorem.  Otherwise  the  normalized  conjecture  becomes  the  subgoal  to  be  proved. 
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•  Proofs  by  cases  can  further  rewrite  a  conjecture.  The  command  prove  e  by  cases 

ti, . . .  ,tn  directs  LP  to  prove  an  equation  e  by  division  into  cases  ti, . .  .  ,1^  (or  into  two 
cases,  ti  and  if  n  =  1).  One  subgoal  is  to  prove  |  . . .  |  In  addition,  for  each 

i  from  1  to  n,  LP  substitutes  new  constants  for  the  variables  of  ti  in  both  ti  and  e  to 
form  t'-  and  e',  and  it  creates  a  subgoal  e'  with  the  additional  hypothesis  t'i  — >  true.  If 
an  inconsistency  results  from  adding  the  case  hypothesis  t'i,  that  case  is  impossible,  so 
e'  is  vacuously  true. 

Case  analysis  has  two  primary  uses.  If  the  conjecture  is  a  theorem,  a  proof  by  cases 
may  circumvent  a  lack  of  completeness  in  the  rewrite  rules.  If  the  conjecture  is  not  a 
theorem,  an  attempted  proof  by  cases  may  simplify  the  conjecture  and  make  it  easier 
to  understand  why  the  proof  is  not  succeeding. 

•  Proofs  by  induction  are  based  on  the  induction  rules  described  above. 

•  Proofs  by  contradiction  provide  an  indirect  method  of  proof.  If  an  inconsistency  follows 
from  adding  the  negation  of  the  conjecture  to  LP’s  logical  system,  then  the  conjecture 
is  a  theorem. 

•  Proofs  of  implications  can  be  carried  out  using  a  simplified  proof  by  cases.  The  com¬ 
mand  prove  ti  =^-  ti  by  directs  LP  to  prove  the  subgoal  using  the  hypothesis 
t\  true,  where  t'j  and  tj  are  obtained  as  in  a  proof  by  cases.  (This  suffices  because 
the  implication  is  vacuously  true  when  tj  is  false.) 

•  Proofs  of  conjunctions  provide  a  way  to  reduce  the  expense  of  equational  term  rewrit¬ 
ing.  The  command  prove  by  &  directs  LP  to  prove  ti, . . . ,  <„  as  subgoals. 


LP  allows  users  to  determine  which  of  these  methods  of  backward  inference  are  applied 
automatically  and  in  what  order.  The  LP  command 


set  proof-method  &  ,  =t',  normalization 


directs  LP  to  use  the  first  of  the  three  named  methods  that  applies  to  a  given  conjecture. 

Proofs  of  interesting  conjectures  hardly  ever  succeed  on  the  first  try.  Sometimes  the  conjec¬ 
ture  is  wrong.  Sometimes  the  formalization  is  incorrect  or  incomplete.  Sometimes  the  proof 
strategy  is  flawed  or  not  detailed  enough.  When  an  attempted  proof  fails,  we  use  a  variety 
of  LP  facilities  (e.g.,  case  analysis)  to  try  to  understand  the  problem.  Because  many  proof 
attempts  fail,  LP  is  designed  to  fail  relatively  quickly  and  to  provide  useful  information  when 
it  does.  It  is  not  designed  to  find  difficult  proofs  automatically.  Unlike  the  Boyer-Moore 
prover,  it  does  not  perform  heuristic  searches  for  a  proof.  Unlike  LCF,  it  does  not  allow  users 
to  define  complicated  search  tactics.  Strategic  decisions,  such  as  when  to  try  induction,  must 
be  supplied  as  explicit  LP  commands  (either  by  the  user  or  by  a  front-end  such  as  LSLC). 
On  the  other  hand,  LP  is  more  than  a  “proof  checker,”  since  it  does  not  require  proofs  to  be 
described  in  minute  detail.  In  many  respects,  LP  is  best  described  as  a  “proof  debugger.” 
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12.4  Hardware  Verification 


For  many  years,  engineers  have  used  simulation  to  convince  themselves  that  the  circuits  they 
design  behave  as  intended.  As  circuits  get  more  complex,  it  becomes  tempting  to  augment 
simulation  with  formal  proofs.  Typically,  these  proofs  involve  a  large  number  of  simple  steps. 
Doing  them  by  hand  is  cumbersome,  boring,  and  prone  to  mistakes.  Unless  these  proofs  are 
machine  generated  or  machine  checked,  there  is  very  little  reason  to  believe  them. 

In  the  past  year,  we  have  conducted  several  successful  experiments  using  a  theorem  prover, 
LP;  to  verify  properties  of  VLSI  circuits.  We  started  with  several  circuits  that  had  previously 
been  verified  by  hand.  We  then  tried  to  construct  machine  checked  proofs  with  the  same 
structure  as  the  original  proofs.  In  the  process  of  using  LP  to  verify  the  circuits,  we  uncovered 
several  minor  errors  in,  and  simplifications  to,  the  original  circuits  and  manual  proofs. 

Any  formalized  verification  of  a  circuit  must  be  based  on  an  abstract  description  of  the  circuit. 
The  choice  of  descriptive  mechanism  depends  upon  the  intended  use.  Differential  equations, 
for  example,  are  useful  in  verifying  physical  properties  such  as  power  consumption,  timing, 
or  heat  dissipation.  Our  approach  is  aimed  at  verifying  functional  properties  of  a  design, 
and  is  based  on  describing  the  circuit  as  a  parallel  program,  using  a  language.  Synchronized 
Transitions,  developed  by  Jprgen  Staunstrup  of  the  Technical  University  of  Denmark. 

While  the  circuits  we  have  verified  are  not  particularly  complex,  our  experiments  yielded 
several  interesting  insights.  These  include: 


•  Even  for  simple  circuits,  one  cannot  rely  on  proofs  that  have  not  been  machine  checked. 

•  Combined  with  Synchronized  Transitions,  the  technique  of  invariant  assertions  used  to 
verify  safety  properties  of  concurrent  programs  is  useful  for  machine-aided  reasoning 
about  circuits. 

•  The  verification  process  is  quite  sensitive  to  the  exact  way  in  which  the  problem  is 
formulated.  For  example,  proofs  seem  to  work  better  when  induction  can  be  done  over 
the  structure  of  the  circuit  rather  than  over  time. 

•  Circuit  verification  seems  more  amenable  to  machine  checking  than  traditional  program 
verification. 

•  The  style  of  mechanical  theorem  proving  supported  by  LP  seems  well  suited  to  reason¬ 
ing  about  circuits. 


12.5  Programming  Multiprocessors 


In  this  research,  we  consider  the  problem  of  writing  explicitly  parallel  programs  for  small 
to  medium  sized  multiprocessors.  To  limit  the  scope  of  the  work,  the  target  applications 
are  assumed  to  be  symbolic  problems,  which  are  characterized  by  data  structures  that  are 
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irregular  and  dynamic  and  by  a  low  percentage  of  numerical  operations.  In  addition,  we 
consider  only  applications  for  which  a  sequential  solution  is  possible,  i.e.,  concurrency  is 
iisetl  to  improve  performance  but  is  not  inherent  in  the  problem  definition. 

V  arious  methods  have  been  proposed  to  address  the  problem  of  writing  and  reasoning  about 
[rarnllel  programs,  but  these  tend  to  address  only  single  module  programs.  Notions  of  module 
compo.sition  are  missing  and  as  a  result,  a  style  of  program  development  is  encouraged  in 
wliich  the  entire  program  is  designed  and  implemented  as  a  single  unit.  Conversely,  methods 
that  liave  been  developed  for  '‘programming  in  the  large”  of  sequential  programs  are  not 
ap[>!icahle,  and  attempts  to  extend  them  to  parallel  programs  often  result  in  programs  that 
exhibit  very  little  real  concurrency.  Our  goal  is  to  be  able  to  decompose  parallel  programs 
into  independently  specifiable  units  without  prohibiting  efficient  implementations. 

rhe  research  is  organized  around  an  extended  example  application,  which  involves  parallel 
algorithm  synthesis,  correctness  arguments,  program  module  specifications,  an  implementa¬ 
tion.  and  performance  measurements.  The  application  is  to  solve  the  completion  problem 
for  term  rewriting  systems,  for  which  the  well  known  Knuth  and  Bendix  procedure  is  a  se- 
(luential  solution.  Although  the  completion  problem  has  been  studied  extensively,  there  are 
currently  no  parallel  solutions. 

Our  [larallel  solution  can  be  abstractly  described  by  a  set  of  inference  rules  that  are  non- 
ileterministically  applied  to  a  system  of  rewrite  rules  and  equations.  In  recent  years,  there 
has  been  a  trend  toward  describing  sequential  completion  procedures  in  this  manner,  often 
leading  to  better  algorithms  and  simpler  correctness  arguments.  The  framework  is  especially 
useful  in  the  context  of  concurrency,  since  there  are  known  techniques  for  reasoning  about 
concurrent  systems  using  the  possible  sequences  of  state  transitions.  In  this  respect,  our 
set  of  inference  rules  defines  the  set  of  possible  stare  transitions,  but  there  is  an  important 
dillereuce  in  how  we  intend  to  reason  about  these  systems.  Rather  than  reasoning  about 
the  behavior  of  each  high  level  transition  by  considering  directly  the  sequence  of  low  level 
operations  that  implemert  it,  we  intend  to  use  the  specifications  of  the  underlying  objects 
and  their  operations;  each  object  is  thus  an  independent  unit  which  can  be  reused  in  any 
otln'r  context  for  which  the  same  specification  is  required. 

7  he  implementation  effort  is  still  in  progress  but  has  already  uncovered  a  number  of  interest¬ 
ing  tradeoffs  in  the  general  programming  problem.  For  example,  concurrent  data  types  that 
present  the  illusion  of  sequential  access  often  lead  to  poor  performance.  77ie  performance 
may  be  charact erized  by  either  long  latency  of  operations,  or  little  actual  concurrency  of 
multiple  (operations.  In  addition,  examples  of  highly  concurrent  data  types  tend  to  have 
complex,  almost  my.sterious  implementations.  In  some  cases,  specification  detail  can  be 
;ulded  to  allow'  implementations  that  are  more  efficient  or  less  complex,  but  for  this  we  pay 
a  price  in  terms  of  generality  of  the  abstraction.  VVe  currently  have  an  implementation  of 
a  (  (unph'tiori  procedure  that  exploits  a  small  amount  of  parallelism,  and  closely  mimics  the 
lu'havior  of  a  sequential  implementation.  ,As  we  add  riiore  parallelism  to  the  procedure,  the 
mapping  between  the  parallel  and  sequential  solutions  will  become  less  obvious,  and  the 
correctness  argvnuent  more  difficult. 
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12.6  Logic  Programming 


Most  of  the  theoretical  work  on  the  semantics  of  logic  programs  assumes  an  interpreter  that 
provides  a  complete  resolution  procedure.  In  contrast,  for  reasons  of  efficiency,  most  logic 
programming  languages  are  built  around  incomplete  procedures.  This  difference  is  rooted 
in  Prolog,  which  evaluates  resolvent  trees  in  a  depth-first  rather  than  a  breadth-first  order. 
The  gap  is  widened  by  some  equational  logic  languages,  which  combine  the  incompleteness  of 
depth-first  evaluation  with  incomplete  approximations  to  equational  unification.  Because  of 
this  gap,  it  is  unsound  to  reason  about  logic  programs  using  their  declarative  semantics.  This 
in  turn  makes  it  difficult  to  develop  abstraction  mechanisms  that  can  be  used  to  partition  a 
logic  program  into  independently  specifiable  modules. 

In  this  work,  we  considered  the  role  type  systems  can  play  in  closing  the  gap  between 
the  operational  and  declarative  semantics  of  logic  programs.  We  develop  the  notion  of  an 
equational  mode  system  for  use  in  constraining  the  domains  of  both  predicates  and  unification 
procedures.  The  mode  system  is  used  to  guide  the  resolution-based  interpreter,  and  as  a 
result,  we  can  show  that  two  predicate  implementations  with  the  same  declarative  meaning 
will  be  operationally  equivalent. 

This  work  was  done  in  conjunction  with  Joseph  Zachary  of  the  University  of  Utah. 
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13.1  Introduction 

The  MIT/LCS  Theory  of  Computation  (TOC)  Group  is  one  of  the  largest  theoretical  com¬ 
puter  science  research  groups  in  the  world.  It  includes  faculty,  students,  and  visitors  from 
both  the  Departments  of  Electrical  Engineering  and  Computer  Science,  and  Applied  Math¬ 
ematics. 

The  principal  research  areas  investigated  by  members  of  the  TOC  Group  are: 


•  algorithms:  combinatorial,  geometric,  graph-theoretic,  number  theoretic; 

•  cryptology; 

•  computational  complexity; 

•  parallel  computation; 

•  distributed  computation;  .algorithms  and  semantics; 

•  machine  learning; 

•  semantics  and  logic  of  programs;  and 

•  VLSI  design  theory. 


Group  members  were  responsible  for  over  150  publications  and  several  dozen  public  lectures 
around  the  world  during  the  past  year.  The  individual  reports  by  faculty  and  students  in  the 
next  sections,  and  the  annotated  reference  and  lecture  lists  offer  further  descriptions  of  the 
year’s  activities. 

The  following  major  research  contributions  merit  highlighting: 


•  Awerbuch,  Mansour,  and  Shavit’s  polynomial  solution  to  the  basic  network  problem 
of  “end  to  end  communication”. 

•  Awerbuch  and  Sipser’s  efficient  implementation  (constant  time  overhead)  of  the  new 
notion  of  a  “synchronizer  for  dynamic  networks”  implying  that  dynamic  networks  are 
as  fast  as  static  networks. 

•  Elias'  geometric,  demonstration  that  rebable  communication  at  a  positive  rate  is  possi¬ 
ble  over  a  channel  which  introduces  a  fraction  1/2  -  e  of  errors,  so  long  as  the  receiver 
is  allowed  to  list  0(l/f^)  possible  transmitted  codewords  rather  than  just  one. 

•  Fortnow  and  Sipser’s  oracle  collapsing  the  probabilistic  polynomial  time  hierarchy. 
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•  Koch  proved  a  decade  old  conjecture  about  the  expected  throughput  of  the  dilated 
butterfly  switching  network  that  has  application  to  the  optimal  design  of  networks  like 
that  used  in  the  BBN  Butterfly  Machine  (Ph.D.  thesis). 

•  Leighton  and  Maggs  developed  highly  efficient  packet  routing  algorithms  for  a  twin 
butterfly.  The  algorithms  are  the  first  fault-tolerant  routing  algorithms  for  bounded 
degree  switching  networks,  and  appear  to  be  superior  to  currently  used  algorithms  even 
if  there  are  no  faults. 


The  following  are  special  awards  received  during  the  period: 


•  Lynch  was  chosen  to  deliver  the  keynote  address  at  last  summer’s  symposium  on  Prin¬ 
ciples  of  Distributed  Computing. 

•  Meyer  was  chosen  to  present  an  invited  lecture  at  the  Third  IEEE  Symposium  on  Logic 
in  Computer  Science,  July  1988. 

•  Sipser  was  chosen  as  the  principal  lecturer  in  the  American  Mathematical  Society 
Conference  on  Circuit  Complexity  to  be  held  this  August  in  Chicago. 

Baruch  Awerbuch 

Awerbuch  has  been  working  on  designing  efficient  and  reliable  distributed  protocols,  with 
emphasis  on  issues  related  to  dynamic  networks. 

He  put  a  great  deal  of  effort  into  development  of  an  efficient  compiler  for  dynamic  network 
protocols.  In  [25]  he  used  techniques  of  amortized  analysis  to  improve  the  best  known 
compiler  for  asynchronous  protocols.  Together  with  Sipser  [32],  he  introduced  a  new  concept 
of  dynamic  synchronizer  vfh.ich  allows  us  to  apply  static  synchronous  protocols  in  a  dynamic 
asynchronous  network.  This  protocol  is  very  fast,  requiring  0(1)  time  overhead,  thus  showing 
that  dynamic  asynchronous  networks  are  as  fast  as  static  synchronous  ones.  Finally,  working 
with  Afek  and  Moriel  (Tel-Aviv)  [3],  he  discovered  a  compiler  whose  overheads  depend 
exclusively  on  the  overheads  of  the  origir  ’  protocol. 

Awerbuch  also  worked  on  many  specific  problems  in  dynamic  networks.  Together  with 
Shavit  and  Mansour  [31],  Awerbuch  discovered  the  first  polynomial  solution  to  the  end-to- 
end  communication  problem.  This  is  one  of  the  basic  network  problems;  it  was  conjectured  in 
[4]  that  it  h  as  no  polynomial  solution.  Together  with  Goldberg  (Stanford),  Luby  (ICSI)  and 
Plotkin  (Stanford)  he  found  a  new'  technique  [28]  for  removing  randomness  from  distributed 
computing  that  has  yielded  fast  deterministic  algorithms  for  Maximal  Independent  Set,  A+  1 
Coloring  and  Breadth  First  Search.  Together  with  Kutten  (IBM  Yorktown)  and  Cidon 
(IBM  Yorktown),  he  discovered  an  efficient  algorithm  for  maintaining  a  tree  in  a  dynamic 
network,  'fogether  with  Goldreich  and  Herzberg  (Technion)  [29j,  he  developed  a  quantitative 
framework  for  analyzing  performance  of  broadcast  protocols  in  dynamic  networks. 
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Another  area  of  Awerbuch’s  research  has  been  in  distributed  graph  algorithms.  Together  with 
Bar-Noy  (Stanford  University),  Linial  (IBM  Almaden  Research  Center),  and  Peleg  (Stanford 
University)  [27],  he  discovered  new  routing  schemes  that  use  only  bounded  space,  have  low 
communication  overhead,  can  be  constructed  online,  work  for  weighted  graphs,  and  do  not 
require  changes  in  node  identities.  He  discovered  a  new  efficient  BFS  and  Shortest  Paths 
algorithm  [26],  which  is  efficient  both  in  time  and  in  communication.  This  algorithm  has 
an  interesting  recursive  structure.  Together  with  Goldreich  (The  Technion),  Peleg  (Stanford 
University),  and  Vainish  (The  Technion)  [30],  Awerbuch  studied  performance  of  broadcast 
protocols  in  point-to-point  networks. 

Peter  Elias 

The  paper  on  the  zero-error  capacity  of  a  binary  channel  under  jamming  using  list  decoding, 
which  was  accepted  for  publication  at  the  time  of  the  last  annual  progress  report,  has  since 
appeared  [98].  Its  appearance  led  to  correspondence  with  Korner,  who  has  been  working 
on  related  topics  with  Marton  and  Simonyi.  Their  work  arose  from  a  paper  on  hashing  by 
Fredman  and  Komlos  [114].  They  published  one  paper  [185]  and  submitted  two  more,  which 
include  new  results  relevant  to  zero-error  capacity  under  list  decoding. 

The  second  paper  mentioned  in  the  last  progress  report,  which  does  not  have  to  do  with 
zero-error  capacity  but  with  error-correcting  codes  under  list  decoding,  has  appeared  as  a 
technical  report  and  has  been  submitted  for  publication  [99|. 

Current  work  explores  iterative  coding  schemes.  These  schemes  generate  codes  which  differ 
from  typical  error-correcting  block  codes  in  that  they  are  not  guaranteed  to  correct  all  sets 
of  less  than  k  errors  out  of  n  for  some  integers  k,n  but  only  most  such  sets.  Only  codes  with 
this  property  can  be  used  to  communicate  at  rates  near  channel  capacity:  as  discussed  in 

98;  and  ;99],  the  capacity  of  a  channel  subject  to  a  jammer  who  can  alter  any  k  symbols 
out  of  n  is  significantly  less  than  that  of  a  channel  in  which  bits  are  subject  to  statistically 
independent  errors  with  probability  k/n. 

The  first  analysis  of  these  codes  appeared  in  [97].  It  show'ed  that  they  could  be  used  to 
transmit  without  error  at  a  positive  rate,  by  using  check  symbols  to  correct  each  row  of 
transmitted  symbols,  rows  of  check  symbols  to  correct  each  column  in  a  two  dimensional 
array,  layers  of  check  symbols  to  check  preceding  layers  in  a  rectangular  solid,  and  so  on.  The 
fraction  of  the  symbols  used  for  checking  is  less  than  1  in  the  limit  if  the  sizes  of  successive 
dimensions  increase,  e.g.,  in  a  geometric  series. 

In  97;  each  order  of  check  symbols  is  used  only  once  and  then  discarded.  That  sufficed 
to  show  that  communication  at  a  positive  rate  is  possible,  but  the  proof  gives  a  rate  sub¬ 
stantially  below  channel  capacity.  The  rates  of  iterated  codes  come  much  closer  to  capacity 
when  lower  order  check  bits  are  used  to  make  further  corrections  after  each  use  of  higher 
order  check  bits,  and  the  process  is  continued  until  a  stable  state  is  reached.  Since  stalis- 
ti<  al  independence  disappears  after  such  recycling,  getting  tight  bounds  on  the  amount  of 
improvement  i^  difficult.  Both  analysis  and  simulation  are  being  used  to  explore  this  domain. 
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Shaft  Goldwasser 

Goldwasser’s  work  focused  on  designing  efficient  digital  signature  schemes  and  on  designing 
multi-party  secure  cryptographic  protocols. 

Beaver  and  Goldwasser  [36J  designed  a  protocol  for  n  processors,  a  majority  of  which  can  be 
faulty,  to  compute  any  polynomial  time  function  defined  on  the  processors  private  inputs. 
The  function  is  computed  preserving  privacy.  Namely,  no  coalition  of  faulty  processors 
can  discover  more  about  non-faulty  processors  inputs  than  implies  by  the  function  value. 
Moreover,  the  faulty  processors  can  find  out  the  function  value  “if  and  only  if”  the  non- 
faulty  processors  find  out  the  function  value,  in  a  strong  probabilistic  sense.  This  is  the 
first  solution  in  the  case  where  the  faults  constitute  more  than  a  majority  of  the  network 
processors. 

Ben-Or,  Kilian,  Goldwasser,  and  Wigderson  [43]  designed  two  extremely  efficient  user  iden¬ 
tification  methods  (using  no  modular  multiplications  and  based  on  the  difficulty  of  the  NP- 
complete  subset-sum  problem).  These  schemes  work  in  the  two  prover  interactive  proof 
model  introduced  by  the  same  authors  in  ’88.  Namely,  the  prover  (e.g..  Bank  card  holder) 
is  split  into  two  agents,  and  the  verifier  (e.g.,  the  Bank  teller  machine)  guarantees  that  the 
two  agents  can  not  transfer  information  to  each  other  during  the  identification  process. 

Bellare  and  Goldwasser  [37]  introduced  new  paradigms  for  digital  signatures  and  message 
authentications  which  are  a  complete  departure  from  the  digital  signatures  schemes  based  on 
Diffie-Hellman  trapdoor  function  model  or  the  recent  digital  signature  scheme  of  Naor-Yung. 
The  new  scheme  is  based  on  the  use  of  random  functions  and  noninteractive  zero-knowledge 
proofs. 

Goldwasser  has  also  been  developing  a  monograph  of  lecture  notes  in  cryptography,  an 
outgrowth  of  her  lectures  in  the  MIT  cryptography  and  cryptanalysis  class.  Goldwasser 
chaired  the  CRYPTO-88  conference  held  in  Santa  Barbara  in  August  1988.  She  was  a 
member  of  the  STOC  1989  conference  committee,  and  together  with  Rivest,  wrote  a  survey 
article  on  cryptography  for  the  handbook  on  computer  science. 

Tom  Leighton 

Together,  Leighton  and  his  students  made  solid  progress  on  packet  routing  algorithms,  fault 
tolerance  in  networks,  and  on  graph  embedding  problems.  At  this  point  they  are  getting 
close  to  asymptotically  optimal  results  that  also  appear  to  work  well  in  reality.  In  fact,  the 
highlight  of  the  coming  summer  and  fall  will  be  to  help  design  and  layout  a  multibutterfly 
network  for  d'om  Knight’s  new  machine.  With  a  little  luck,  theory  will  be  able  to  play  an 
important  role  in  the  development  of  a  state  of  the  art  machine.  They  are  also  working  with 
Bill  Dally  and  his  students  to  see  if  theory  can  be  helpful  with  the  routing  protocols  on  his 
new  machine,  and  have  been  talking  with  .-Man  Baratz  about  the  possibilities  of  implementing 
some  of  the  new  theory  routing  algorithms  on  the  GFll  so  that  it  can  become  a  genera! 
purpose  routing  machine. 
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Another  highlight  of  the  coming  year  will  be  the  new  ACM  Symposium  on  Parallel  Algo¬ 
rithms  and  Architectures  that  Leighton  has  been  helping  to  organize.  The  first  meeting  will 
he  in  Santa  Fe  in  mid- June,  and  there  should  be  a  large  contingent  from  MIT  at  the  meeting. 
Papers  to  be  presented  range  from  theory  to  practice,  and  the  meeting  should  provide  a  good 
forum  for  interaction  between  people  who  think  about  parallel  machines,  those  who  build 
them,  and  those  who  use  them.  The  1990  meeting  is  in  Crete,  so  now  would  be  a  good  time 
to  start  thinking  about  submitting  a  paper! 

Maggs,  Rao,  Koch,  and  Newman  are  students  getting  their  Ph.D.’s  this  year. 

In  addition,  Leighton  is  continuing  work  on  his  book  on  parallel  computation.  He  f'.xpects 
to  have  V'olume  I  done  by  early  next  year. 

Charles  E.  Leiserson 

Leiserson  returned  in  January  from  a  leave  of  absence  at  Thinking  Machines  Corporation, 
where  he  worked  on  the  design  of  a  parallel  computer.  He  was  an  invited  speaker  at  the 
.Anniversary  Symposium  for  Project  MAC  at  MIT,  and  at  the  Decennial  Caltech  VLSI 
C’onference.  He  served  on  the  program  committee  for  the  IEEE  Foundations  of  Computer 
Science  Conference.  He  also  served  on  the  first  program  committee  for  the  ACM  Symposium 
on  Parallel  Algorithms  and  .Architectures. 

Leiserson  has  spent  much  of  his  time  in  the  past  year  w'orking  on  a  textbook  entitled  In¬ 
troduction  to  Algorithms,  coauthored  with  Cormen  and  Rivest.  The  textbook  attempts  to 
provide  a  rigorous,  but  elementary,  introduction  to  the  area  of  analysis  of  algorithms.  It  will 
be  published  jointly  by  MIT  Press  and  McGraw-Hill  later  this  year. 

Two  of  Leiserson’s  Ph.D.  students  completed  their  degrees  in  the  past  year.  Plotkin’s  thesis  is 
entitled  Graph-Theoretic  Techniques  for  Parallel,  Distributed,  and  Sequential  Computation. 
Plotkin  assumed  a  postdoctoral  position  at  Stanford  and  will  be  an  assistant  professor  in 
the  fall.  Blelloch’s  thesis  is  entitled  Scan  Primitiv  >  and  Parallel  Vector  Models.  Blelloch 
accepted  an  as.sistant  professorship  at  Carnegie-Mellon  University. 

Three  student.s  completed  their  Master’s  degrees  under  the  supervision  of  Leiserson.  Ishii’s 
tliesis.  .1  Digital  Model  for  Lr  re  I- Clocked  Circuitry;  Park’s  thesis.  Notes  on  Searching  in 
MultidirnrrisioTuil  Monotone  Arrays;  and  Fried’s  thesis,  VLSI  Processor  Design  for  Commu¬ 
nication  .\etu’orks. 

Leiserson  has  also  been  sii per vi '•iiig  Cormen,  Greenberg,  Kipnis,  Maggs,  Phillips,  and  Pa- 
paeft  hy miou . 

Nancy  A.  Lynch 


Plea.s'p  see  entry  under  the  (  Impter  on  Theory  of  Distributed  Systems. 
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Albert  R.  Meyer 

Meyer’s  research  has  focused  on  semantics  and  logic  of  programming  languages.  During  the 
past  year,  he  worked  on  the  following: 


Research  Topics 

•  Semantics  of  Concurrency:  Meyer,  with  Bloom  and  Istrail  (Wesleyan),  question  the 
foundations  of  Hoare’s  CSP  and  Milner’s  CCS  theories  of  concurrency  [51][53][52]. 
They  propose  a  new  notion  of  process  equivalence  and  show  it  lies  strictly  between 
that  of  CSP  and  CCS.  See  the  report  of  Bard  Bloom  for  more  complete  discussion. 

•  Semantics  of  Terminating  Evaluation:  Research  with  Bloom,  Riecke,  and  Cosmadakis 
(IBM  Watson  Research  Center)  on  the  general  connection  between  operational  and 
denotational  semantics,  focusing  on  repairing  the  mismatch  between  semantics  in  which 
expressions  M  and  Xx.M  mean  the  same  thing,  even  though  evaluation  of  M  diverges 
but  evaluation  of  Xx.M  terminates  immediately,  cf.  [233][85][54].  See  the  report  of 
Riecke. 

•  Dataflow  Semantics:  See  the  report  of  Rudich. 

•  Theory  of  Sequential  Functions:  See  the  report  of  Jim. 

•  Type-checking  for  Records  with  Inheritance:  See  the  report  of  Jategoankar. 


Professional  Activities 

•  Chairman,  MIT  Project  MAC  25*^  Anniversary  Celebration,  October  1988. 

•  Conference  Chairman,  IEEE  Symposium  on  Logic  in  Computer  Science  (LICS),  Seat¬ 
tle,  WA,  May  1989. 

•  Moderator  for  three  Computer  Science  research  email  forums  on  (1)  types,  (2)  concur¬ 
rency,  and  (3)  logic. 

•  Member,  Program  Committee,  International  Symposium  on  Logic  at  Botik,  Pereslavl- 
Zalessky,  USSR,  July  1989;  “Kleene  ’90”  Logic  Symposium,  Chaika,  Bulgaria,  June 
1990. 

•  Thesis  Supervision: 

PhD  Bio  om,  expected  September  1989. 

SM  Riecke,  completed  January  1989  [273]. 

Jategoankar,  expected  September  1989  [171]. 

•Jim,  expected  January  1990. 

Rudich,  expected  January  1990. 
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SB  Ernst,  completed  May  1989  [lOOj. 
Lent,  expected  May  1990. 

Siegel,  expected  September  1989. 


•  Editorial  Activity: 

Editor-in-Cliief,  Information  and  Computation',  Managing  Editor,  Annals  of  Pure  and 
Applied  Logic,  Editorial  Board  Member,  SIAM  Journal  of  Computing,  Journal  of  Com¬ 
puter  and  System  Sciences,  Theoretical  Computer  Science,  and  Advances  in  Applied 
Mathematics',  Advisory  Editor,  Handbook  of  Logic  in  Computer  Science  and  Handbook 
of  Theoretical  Computer  Science',  Co-editor,  Proceedings  of  Logic  at  Botik  [235|;  MIT 
Press  Foundations  of  Computing  Series  Co-editor;  MIT  Press  Editorial  Board  Me  mber. 


Silvio  Micali 

Micali’s  work  focused  on  cryptography  and  zero-knowledge  proofs.  In  particular,  the  follow¬ 
ing  results  were  obtained: 

1.  Goldreich,  Micali,  and  Wigderson  previously  proved  that  aU  theorems  in  NP  possess  a 
zero-knowledge  proof.  Extending  that  work,  [41]  showed  what  can  be  efficiently  verified 
can  be  proven  in  zero  knowledge. 

2.  [237]  constructed  a  very  efficient  “password”  scheme.  The  person  seeking  identification 
is  required  to  perform  the  equivalent  of  two  multiplication  modulo  on  an  integer  that 
is  hard  to  factor.  These  special  “passwords”  are  hard  to  compromise  both  by  someone 
simply  listening  to  the  identification  process  and  by  the  password  verifier  herself. 

Ronald  L.  Rivest 

Rivest’s  work  focuses  on  the  theoretical  aspects  of  machine  learning. 

Rivest  is  continuing  to  work  with  Schapire  on  problems  related  to  the  inference  of  finite 
automata.  Their  motivation  has  been  the  “artificial  intelligence”  problem  faced  by  a  robot 
placed  in  an  unfamiliar  environment  with  no  a  priori  knowledge  of  its  world.  The  goal  of 
the  robot  is  to  learn  the  structure  of  its  environment  through  systematic  experimentation. 

Schapire  and  Rivest  [274]  have  been  developing  an  interesting  extension  to  Angluin’s  finite 
automaton  inference  procedure  [15].  The  new  algorithm  can  infer  an  automaton  even  when 
no  “reset"  is  available  (i.e.,  there  is  no  means  of  bringing  the  automaton  back  to  the  start 
state),  and  can  he  used  for  inferring  automata  using  either  the  global  state-space  representa¬ 
tion  or  the  diversity-based  representation  previously  developed  by  Rivest  and  Schapire.  The 
algorithm  has  been  implemented  and  seems  quirv’  efficient  in  practice. 

iogether  witii  Goldman  and  Schapire,  Rivest  studied  the  problem  ot  “learning  a  binary 
relation”  iliJOi.  In  this  proldem.  the  entries  of  a  matrix  representing  a  binary  relation  are 
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repeatedly  probed.  Before  each  probe,  the  “learner”  must  predict  the  value  of  the  matrix 
entry  about  to  be  probed.  The  goal  of  the  learner  is  to  make  as  few  prediction  errors  a.s 
possible.  In  order  to  model  the  natural  “structure”  that  may  be  present  in  many  binary 
relations,  such  structure  being  what  gives  the  learner  the  leverage  needed  to  make  fewer 
than  the  maximum  possible  number  of  prediction  errors,  it  is  assumed  that  there  are  only 
a  small  number  k  of  different  row  types.  Algorithms  are  developed  and  analyzed  that  make 
a  small  number  of  errors  in  this  case,  and  some  interesting  lower  bounds  (based  on  the 
existence  of  projective  geometries)  are  proved. 

Goldman  and  Rivest  [129]  also  worked  on  the  problem  of  efficiently  implementing  the  “halv¬ 
ing  algorithm.”  The  halving  algorithm  applies  to  situations  (like  the  relation-learning  prob¬ 
lem  of  the  last  paragraph)  where  the  learner  must  predict  the  classification  of  each  instance 
before  being  told  the  true  classification,  and  where  the  learner’s  goal  is  to  minimize  the 
number  of  prediction  errors  made.  The  halving  algorithm  (due  to  Barzdin  and  Freivalds 
[35]  ,  and  refined  by  Littlestone  [215])  predicts  in  according  to  the  majority  of  the  hypotheses 
consistent  with  all  previous  data;  when  a  prediction  error  is  made  it  therefore  reduces  by  hah 
the  number  of  consistent  hypotheses  remaining.  Based  on  a  proposal  by  Warmuth,  Goldman 
and  Rivest  have  investigated  the  use  of  approximate  counting  scheme  in  order  to  implement 
approximations  to  the  halving  algorithm.  This  idea  can  be  made  to  work  out,  and  can  be 
applied  to  problems  such  as  learning  a  total  order.  (This  problem  is  then  rather  like  the 
problem  of  sorting,  where  an  adversary  gets  to  pick  which  elements  are  to  be  compared  next, 
and  where  you  must  predict  the  outcome  before  each  comparison  is  made.) 

Sloan  has  finished  up  his  Ph.D.  under  Rivest’s  supervision  [290].  His  thesis  explores  a  number 
of  fascinating  issues  and  topics  in  machine  learning  theory,  such  as  the  effect  of  noisy  data  on 
learnability,  techniques  for  learning  a  complicated  concept  reliably  and  usefully  by  learning 
it  “gate  by  gate”  (subconcept  by  subconcept),  and  methods  for  combining  classical  Bayesian 
inference  with  computational  complexity  considerations. 

Linial,  Mansour,  and  Rivest  extended  and  presented  their  work  showing  that  a  finite  Vapnik- 
Chervonenkis  dimension  is  not  a  limitation  for  learning  a  concept  class,  if  the  size  of  the  data 
sample  used  for  learning  can  be  adjusted  dynamically  as  learning  proceeds  [209].  Intuitively, 
an  algorithm  can  dynamically  request  more  data  when  it  discovers  that  the  concept  being 
learned  is  '‘complex.” 

Blum  finished  up  his  Master’s  thesis  [55]  under  Rivest’s  supervision,  and  the  work  he  and 
Rivest  have  done  on  the  conqilexity  of  training  even  very  simple  neural  networks  was  pre¬ 
sented  at  NIPS  [57j.  The  basic  result  is  that  training  a  three-neuron  neural  network  is 
NP-complete. 

Under  Rivest’s  supervision,  Perugini  has  experimentally  examined  the  effect  of  training  set 
data  size  on  the  efficacy  of  the  “back-propagation”  training  algorithm  for  neural  nets  [259]. 
The  results  were  not  crisp,  hut  some  interesting  pathologies  were  uncovered. 

Together  with  Cormen  and  Leiserson,  Rivest  worked  on  a  introductory  text  on  algorithms 
[84].  This  text  should  be  suitable  for  both  introductory  undergraduate  and  introductory 
graduate  students;  it  should  be  out  later  this  year. 
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David  B.  Shmoys 

Shmoys  studied  a  wide  range  of  questions  in  the  design  and  analysis  of  efficient  algorithms. 
He  continued  his  work  in  the  design  and  analysis  of  approximation  algorithms,  as  well  as  in 
the  design  of  parallel  algorithms  for  graph  problems. 

One  of  the  most  important  algorithms  used  in  the  solution  of  the  traveling  salesman  problem 
is  a  procedure  due  to  Held  and  Karp  [157]  that  produces  an  extremely  tight  lower  bound  on 
the  value  of  the  optimal  solution.  With  Williamson  [287],  Shmoys  considered  this  procedure 
as  an  approximation  algorithm  for  the  value  of  the  optimal  TSP  solution.  First,  they  showed 
that  the  algorithm  has  an  important  monotonicity  property,  in  the  sense  that  the  bound 
delivered  for  a  subset  of  the  input  is  no  more  than  for  the  entire  input.  This  property  makes 
it  possible  to  prove  that  the  procedure  delivers  a  Vtdue  at  least  2/3  of  the  optimal  value. 
Unlike  Christofides’  algorithm,  which  is  the  best  known  approximation  algorithm  for  the 
problem  (and  guarantees  identical  performance),  this  bound  is  not  known  to  be  tight. 

One  major  area  of  Shmoys’  research  is  in  the  area  of  the  theory  of  scheduling.  Together 
with  Lawler  (UC /Berkeley),  Lenstra  (CWI)  and  Rinnooy  Kan  (Erasmus)  [197],  he  wrote  a 
survey  article  of  the  field.  This  survey  was  written  as  part  of  the  preparation  for  a  book  on 
this  subject  by  these  authors. 

With  Hall  (Sloan  School/MIT)  [143][142],  Shmoys  has  been  considering  a  variety  of  approx¬ 
imation  algorithms  for  scheduling  problems.  In  particular,  he  has  been  studying  the  effect 
of  precedence  constraints  and  related  timing  constraints  on  the  possibility  of  obtaining  good 
approximate  solutions.  Hail  and  Shmoys  [143]  consider  the  problem  of  scheduling  n  jobs  on 
a  single  machine,  where  each  job  j  has  a  specified  release  date  Vj  before  which  is  cannot  be 
processed,  a  time  pj  that  specifies  the  amount  of  (continuous)  processing  required,  and  a 
deadline  dj.  (For  technical  reasons,  the  deadlines  are  non-positive.)  If  the  lateness  of  a  job  is 
the  difference  between  the  time  that  a  job  completes  processing  and  its  deadline,  the  aim  is 
to  find  a  schedule  that  minimize.s  the  total  lateness.  For  the  variant  of  the  problem  without 
precedence  constraints,  a  polynomial  approximation  scheme  is  obtained.  For  the  problem 
with  precedence  constraints,  they  give  an  algorithm  that  delivers  a  solution  that  finishes 
within  a  factor  of  4/3  the  optin..  1  time  (improving  on  the  previous  best  algorithm  that  only 
came  within  a  factor  of  2).  This  represents  an  interesting  breakthrough  of  a  “factor  of  2"’ 
barrier  that  is  prevalent  in  approximation  algorithms  for  precedence  constrained  schedubng 
problems.  Also  with  Hall  [142],  .Shmoys  considers  the  natural  generalization  of  the  previous 
work  to  the  case  when  there  are  parallel  identical  machines  to  do  the  processing.  For  this 
problem  without  precedence  constraints,  a  polynomial  approximation  scheme  was  obtained. 
With  precedence  constraints,  an  algorithm  that  delivers  a  solution  at  most  a  factor  of  2  more 
the  optimal  was  obtained. 

In  tiie  area  of  parallel  graph  algorithms,  together  with  Goldberg  (Stanford),  Plotkin  (Stan¬ 
ford),  and  Tardos  [124],  Shmoys  considers  the  question  of  parallel  algorithms  for  bipartite 
matching.  Ry  using  techniques  developed  for  general  purpose  sequential  algorithms  for  linear 
jirograrnming.  .so  called  interior-point  methods,  they  obtain  an  algorithm  that  requires  only 
m)  steps  on  a  polynomial  number  of  processors,  where  m  denotes  the  number  of  edges 
in  the  graph,  and  O*  indicates  that  lower  order  polylogarithmic  factors  have  been  ignored. 
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Michael  Sipser 

Sipser  is  continuing  his  work  on  lower  bounds  in  complexity  theory  and  the  structure  ol 
complexity  classes. 

One  of  the  important  achievements  of  the  past  year  was  the  construction  of  an  oracle  col¬ 
lapsing  the  probabilistic  time  hierarchy  done  jointly  with  Fortnow  [112].  Time  hierarchies 
for  deterministic  and  nondeterministic  computation  are  among  the  earliest  results  proved  in 
complexity  theory.  They  show  that  if  one  is  allowed  a  little  more  time  then  one  can  solve  a 
larger  class  of  problems.  Oddly,  this  has  never  been  established  for  probabilistic  computa¬ 
tion.  It  is  possible  that  any  problem  solvable  in  probabilistic  polynomial  time  can  also  be 
solved  in  probabilistic  linear  time,  surprising  though  this  would  be.  Our  result  shows  why 
this  problem  has  remained  open.  The  existence  of  our  oracle  indicates  that  the  techniques  of 
recursive  function  theory,  which  solved  the  previous  cases,  are  insufficient  to  solve  this  case. 
Fortnow  is  receiving  his  Ph.D.  this  year  under  Sipser’s  guidance. 

Sipser  also  considered  some  problems  in  the  theory  of  distributed  computing.  Together  with 
Awerbuch,  he  gave  a  method  which  facilitates  the  design  of  netv.’ork  protocols  [32].  Using 
this  method,  one  can  first  design  a  protocol  to  run  on  a  static,  synchronous  network  and 
then  automaticrdly  convert  it  to  run  on  a  dynamic,  asynchronous  network.  The  former 
network  model  is  a  simpler  one  on  which  to  conceive  designs,  whereas  the  latter  model  is 
more  realistic. 

Together  with  Boppana,  Sipse-  prepared  a  definitive  survey  on  lower  bounds  on  the  circuit 
complexity  of  boolean  functiois  [60].  This  will  appear  in  the  forthcoming  Handbook  of 
Theoretical  Computer  Science,  dipser  was  sel'^cted  to  be  the  principal  speaker  at  an  American 
Mathematical  Society  conference  on  circuit  complexity.  He  will  prepare  a  monograph  of  these 
lectures  to  be  included  in  the  AMS  CBMS  series. 

Eva  Tardos 

Tardos  has  been  working  on  combinatorial  optimization  problems.  Together  with  Goldberg 
and  Plotkin  from  Stanford  and  Shmoys  from  MIT  [124],  she  developed  an  time 

algorithm,  where  n  and  m  denotes  the  number  of  nodes  and  edges  of  the  input  graph  and  an 
algorithm  is  said  to  run  in  0*{f{n))  time  if  it  runs  in  0(/(n)  1  og''(n))  time  for  some  constant 
k.  In  this  paper,  interior- point  methods  for  linear  programming,  developed  in  the  context  of 
sequential  computation,  are  used  to  obtain  a  parallel  algorithm  for  the  bipartite  matching 
problem.  The  results  extend  to  the  weighted  bipartite  matching  problem  and  to  the  zero- 
one  minimum-cost  flow  problem,  yielding  0*(\/ni]ogC)  algorithms,  where  it  is  assumed  that 
the  weights  are  integers  in  the  range  \-C  ■  •  •  G]  and  C  >  1.  These  results  improve  previous 
bounds  on  these  problems  and  introduce  interior-point  methods  to  the  context  of  parallel 
algorithm  design. 

In  a  joint  paper  with  Plotkin  from  .Stanford  [266!,  Tardos  gave  an  improved  dual  network 
simplex  algorithm.  A  simplified  ver.sion  of  Orlin’s  [248]  strongly  polynomial  mininimn-cost 
flow  algorithtn  is  developed,  and  it  is  shown  how  to  convert  it  to  a  dual  network  simplex. 
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The  pivoting  strategy  leads  to  an  (9(m^log7i)  bound  on  the  number  of  pivots,  which  is  better 
by  a  factor  of  m  compared  to  the  previously  best  pivoting  strategy  due  to  Orlin  [247],  Here 
n  and  m  denotes  the  number  of  nodes  and  arcs  in  the  input  network. 

In  a  joint  paper  with  Frank  from  Budapest  and  Nishizeki,  Saito,  and  Suzuki  from  Tokyo, 
Tardos  developed  simple  efficient  algorithms  for  the  routing  problems  around  a  rectangle. 
These  algorithmic  find  a  routing  with  two  or  three  layers  for  two-terminal  nets  specified  on  the 
sides  of  a  rectangle.  The  minimum  area  routing  problem  is  also  solved.  All  algorithms  run  in 
linear  time.  The  minimum  area  routing  problem  was  previously  considered  by  LaPaugh  and 
Gonzalez  and  Lee.  The  algorithms  they  developed  run  time  O(n^)  and  0(n),  respectively. 
The  simple  linear  time  algoritiim  is  based  on  a  theorem  of  Okamura  and  Seymour,  ana  on  a 
data  structure  developed  liy  .Suzuki,  Isbiguro,  and  Nishizeki. 

Tardos  has  also  written  two  surveys  this  year.  A  general  survey  on  complexity  theory  (or  The 
Handbook  of  Combinatorics  [286]  jointly  with  Shmoys  from  MIT,  and  a  survey  on  the  recent 
development  in  the  theory  c  '  network  flows  [125]  jointly  with  Goldberg  from  Stanford  and 
Tarjan  from  Princeton. 


13.2  Student,  Research  Associate,  and  Visitor  Reports 

Javed  A.  Aslam 

.A.slam  has  been  working  with  Rivest  on  algorithms  for  machine  learning.  Specifically,  he 
has  been  studying  the  radial  mapping  problem  where  a  device  must  infer  the  shape  of  its 
surroundings  by  rotating  in  place  and  taking  distance  measurements.  Relevant  cases  studied 
have  included  those  where  angular  positioning  error  and  distance  measurement  error  are 
present  in  varying  degrees.  .Aslam  recently  began  work  on  the  inference  of  Markov  chains, 
and  plans  to  (mntinue  this  work  with  Rivest  over  the  summer. 

Mihir  Bellare 

Basic  cryptographic  primitives  such  as  zero-knowledge  proofs  and  oblivious  transfer  have 
classically  relied  on  interact ior,  between  the  parties  involved.  A  part  of  Bellare’s  work  has 
hx  used  on  a  new’  public  kry  model  in  which  such  interaction  can  be  removed. 

Bellare  and  .Micali  [dSl  proposed  a  method  via  which  a  collection  of  users  may  first  establish 
public  keys  ami  then  be  able  ’o  accomplish  oblivious  transfer  without  interaction.  Using 
<  arlier  work  of  this  yield'  noninteractive  methods  for  zero-knowledge  proofs. 

Bellare  and  Goldwasser  :li7;  (i''monstrat''d  the  wide  applicability  of  such  noninteractive  zero- 
knowledge  proofs  by  using  tlmm  to  get  simple  and  efficient  schemes  for  digital  signatures 
aufi  message  authentication.  A  feature  of  this  work  was  an  implementation  of  nonintera  ti'.i' 
zero-knowledge  proofs  wdiirh  'redd  be  ehecked  by  aiiy  user  in  the  system  rather  tha^i  ''.y  a 
utigle  recipien i  , 
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III  flirt  her  work  related  to  t  he  role  of  iiit eraei  ion  in  zero  kiiowir-dge  proofs,  Hellare,  Mieali  and 
Ostrovsky  |!!!lj  sivowed  tliat  the  lanj'iia.ji.es  of  graph  isoiiior|)hisiii  and  (piadratie  residiiosil\ 
have  cttitsldiil  rnuiul  perf<‘<  t  /''lo  knowledge  m t.eracl  i ve  jirools.  d'hey  also  jiro.’ided  a  general 
iiieehaiiisni  to  eollajisi-  rounds  in  a  statistical  zero  knowledg<‘  proof  while  iireserving  tin- 
statistical  zero  k  iicj  wledge^  veil  sonic- standard  cry  ptogra  |ihie  assn  in  ption . 

Hcnuiio  FU'rgc  r 

(lerge-r  has  hec-n  working  on  removing  randoinin-.ss  from  paralh-l  and  si-cpien  t  ia  I  algorithms. 
I  his  involves  coming  n|)  with  a  randomized  algcnilhm  lor  a  prohleni,  il  one  does  not  exi.st, 
aiicl  clevising  cir  using  known  tc-c  hmepn-s  to  remove-  this  ra nclom iic-ss. 

Mc-rger  lic-gan  this  work  at.  Hc-Il  laths  last  .siimnic-r  win-n,  with  .Shor,  she-  devised  a  randomizi-d 
si-cpn-n tial  algorithm  for  the-  acyclic-  suligraph  jirohlem  (llic'  dual  of  the-  Ic-eclhack  arc-  sc-l 
proldc-m)  and  nsc-cl  known,  highly  sc-cpiential  leciiiiicpn-s  to  convert  it  to  a  cletc-rminislic 
one-,  t  h<-rc-hy  ac  hieving  tight  hounds  cletc-rministic  ally  lor  t  he-  prohlc-ni  |'1S|.  'This  work  also 
inc'ludc-d  an  RN('  algorithm  lor  the-  |>rcddc-ni  whic  h,  hy  applying  technicjiic-s  ex|>lc)red  in  In-r 
siihsc-cpic-iit  work,  lic-rge-r  is  atlc-mi)ting  to  c-oiivc-rt  to  a  dc-lc-rministic-  one. 

Mc-rge-r’s  suhsc-cpii-nt  work  cc-ntc-red  around  rc-inoving  raiidoinnc-ss  from  parallc-l  algorithms. 

Hc-rger  and  Rccnipc-I  [-l(i|['M|  dc- v<-lopc-cl  a  gc-nc-ral  fraim-work  for  removing  randoinnc-ss  from 
randomized  N(’  algcirithms  whose-  analysis  iisc-s  only  polylogarithmic-  indc-penelencc-.  I’revi 
oiisly,  no  tc-cli n Icpies  were-  known  to  cle-tc-riiiini/,<-  those-  |{N(I  algorithms  il<-pe-nding  cm  more- 
than  coiislant  i nclc-peneie-ncc-.  One-  application  o|  the-ir  te-chnie|ue-s  is  an  N(l  algorithm  for  IIh- 
sc-l  discri-pa ncy  prcdilem,  which  c  an  he-  use-e|  to  <.«l>tain  manyot  he-r  NO  algorithms,  ine  lnding 
a,  hetle-r  NO  edge-  eciloring  algorithm.  As  another  a|)plieation  of  the-ir  l.e-c  hnie|ue's,  the-y  pro 
vide-d  an  N(l  a.lgcirithm  lor  the-  hype-rgra|)h  coloring  |>rohle-m.  I’liis  work  has  hee-n  c  lmse-n  for 
the-  l'’()('S  ’St)  Machtc-y  Award. 

|{e-rger,  Kompe-1,  ancl  Sheer  |  17|  gave-  N't  I  aj>pioximatie)n  aigorithiiis  for  the-  iiii we-iglit e-el  anel 
wc-ightc  cl  :,e-t  c  (ivc-i  predde-ms,  'I  h<-ir  algorithms  use-  a  line-ar  numhe-r  eef  proce-sscers  ancl  give-  a 
cc.ver  that  has  a(  most,  log,!'  Om<-s  tin-  optimal  i.ize-/ we-ight.,  thus  matching  the-  pe-rlorma  in  c- 
cd  the-  hc-.st  ';ec|neiil  lal  algonlhiiis.  I ’i  <- vicnisly,  tln-re-  wc-re-  no  known  ]iara,lle-|  algorithms  lor 
tin-  gc-in-ral  sc-t  cuver  pr'd-lc-m.  ih-rge-r,  Hom|>cl,  anel  Sheer  devise-el  a  randomized  algorithm, 
c|c-penclmg,  on  only  [lairwisc-  md'-pe-inle-nce-,  and  t  he-n  conve-rte-d  it  l.o  a  de-terminist  ic  one-. 
I  he-  dilllcnlt  pait  here  wa-.  coming  up  with  the-  raiidomize-cl  algorithm.  I'lii  tln-rmorc-,  ihc-y 
applied  ihc-M  '.el  cover  alcyinlhm  lee  |c-arning  tlie-ory,  giving  an  N(l  algorithm  to  h-arii  lln- 
c  c>ncc-pt  cla.'is  ohiaineci  hy  takin;'  the-  c  Icesnre-  under  liiiitc-  union  or  iinit.e-  inte-i  sc-ct  ion  of  any 
coinc-pl  c  las.s  <d  limtc-  Vt'  dimeiisnm  which  has  an  N('  hy  jeot  he-.sis  (inch-r.  In  acldilioii,  they 
gave-  a  liin-c.r  lerocc-ss'er  N('  ah’ceiithm  lor  a  variant  ol  tin-  sc-l  covc-r  prohlem  first  juoposc-d 
hy  ('hazi'ile  ami  I'lndman,  and  iise-d  it  tee  ceht.ain  N('  algorilhms  lor  sc-ve-ral  prohlems  in 
Com  [III  (  a  I  n  ma  I  g imc  (  i  \ 
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liard  Bloom 

lllooin,  workliif^  with  Mry'T  atid  Isfrail  (VV<'sU‘yaii)  is  st.iidying  thr  denotational  semantics 
of  ;)arall«'l  and  nondetenninist  ic  ]Uoccsst‘S.  Dana  Scott  ’s  very  successful  models  for  the  se¬ 
mantics  of  secpiential,  deterministic  [)rograms  tlo  not  <>xtend  naturciJly  to  the  more  general 
domain.  I'liere  ar<‘  a  nundjer  of  proposals  for  a  replacement;  Meyer  and  Bloom  are  investi¬ 
gating  several  of  t  hese  models.  One  cei\tral  (piestion  in  semantics  is,  “when  shall  we  consider 
two  programs  etpii valent  .’’  ’  1  wo  proposed  notions  are  trace  congruence  (useti  in  Hoare’s 

language  f’.SF  and  variants)  and  hi  simulation  (nsrul  in  Milner’s  SCCS).  Bloom,  Meyer,  and 
Istrail  ha\  -  found  an  extension  of  SCKdS  in  which  the  two  notions  coincide.  The  new  opera¬ 
tion  is  somewhat  peculiar  in  nature;  they  hav<‘  shown  that  no  finite  set  of  operators  deiined 
in  a  (  lean  way  can  cause  the  two  to  coincide.  Similarly,  bisirnulation  cannot  be  unrlcr.stood 
as  e([ui valence  with  resp<‘ct  t(>  any  s(>t  of  reasonable  experiment.s.  It  can  be  understood  i'l  a 
jirobal)ili.stic  setting;  however,  tfie  translation  from  the  usual  setting  to  the  probabilistic  one 
is  not  efTeitive, 

d  his  work  lead  to  a  notion  of  “ready  simulation”  which  seems  to  have  the  .same  sorts  of 
iormal  properties  as  bisim  ii  I  a  t  ioti  (various  alt<'rnate  definitions,  complete  axiomatizatioiis 
and  polynomial  time  decision  jirocedure.s  for  fiinte  processes,  and  so  forth),  but  can  als(^  be 
understood  as  congnumee  with  r('.spect  to  a  fairly  reasonable  language. 

A  classic  pajier  in  denotational  semantics  (CJordon  f^lotkin’s  IjCF  Considered  as  a  Proyrarn- 
nmig  l.aiignuiit  )  gives  two  kinds  of  semantics  for  a  simple  but  extremely  powerful  language 
based  on  typed  lambda  calculus.  One  semantics  is  operalional^  describing  how  a  particular 
intf'rjireter  com[)iite.s;  the  other  kind  is  denotational,  assigning  meaning  to  the  programs  in 
moderately  familiar  inathemat  ic;d  terms,  using  several  varieties  of  Scott  domains.  The  paper 
shows  that  the  two  semantics  coincide  in  a  weak  sense  (computational  adequacy,  two  integer 
terms  evaluate  to  the  same  (onstarit  if  they  have  the  same  denotational  meaning),  but  not 
111  a  stronger  sense  {full  al>slrn<  ti.(m:  two  routines  behave  identically  in  all  contexts  if  they 
tia  ve  t  he  same  deiiol  at iorial  iiieaiimg).  The  programming  language  can  be  extended  by  the 
addition  of  a  ’parallel  condlt  ioiial”  siicli  that  the  extended  language  is  fully  abstract  for  one 
o!  the  denotational  moflels.  I  he  classic  paper  shows  that  this  extensiem  is  not  billy  abstract 
lor  t  lie  ot  her  languages. 

However,  one  of  tlie  other  denotational  morlels  (Scott  domains  built  from  complete  lattices 
ra  her  than  (  iio's )  is  mat  hem  at  !<  ally  appealing,  and  it  is  somewhat  surprising  that  the  classic 
paper  (lid  not  lind  a.  fully  ab,-.tra(  t  extension  of  \,('V  using  this  model.  However,  this  is  not 
tlie  aiitliors  o\ (T.'.ight .  Hheuii  has  :di(»wii  that  there  is  no  fully  abstract  extension  of  b(lf’ 
u  ith  a  rea-.oii.dile  evaluator  for  a  im  li  tliis  liKrdel  is  fully  abstract,  where  “reasoiiahle"  means 
that  an  atitlimein  expres.,ioii  <aii  evaluate  to  at  most  one  value,  ff  the  evaluator  is  not 
:  et  |u  1  rei  1  to  lie  reason  able  in  i  hi  sense,  t  here  is  a,  sim  [)le  ext<‘nsioii  of  bClF  after  the  sjunt,  of 
i.he  I  !a  n  pa[)et  wliH  ti  i.-.  I’lllv  abstract  f()r  the  lattice  model.  If  tlie  evaluator  ’s  allowed  to 
ha'.e  t  (■(  Ii  n  I!  a  I  ly  pe'ulia.''  property,  it  can  be  made  fully  abstract  for  virtually  any  model 
of  t  lif  I  V ped  hi  III bda  ca  ieii i  u '  . 

(i!oo,:i  a  nd  b’nike  liave  lean  1 1 1 '.<■,(  i  u;,  t  \  n  a  similar  (piestious  for  1  im  so  calb'd  “lifted  Siott 
domain  '  (t.'iiiii.iry  I  ii  m  I  io  n  ,d  iaiiguages  exhibit  sfiirie  Ixdiavior  on  higher-order  fe'i  H 


Theory  of  Computation 


a  program  evaluates  to  a  function,  it  stops  and  prints  “function” — even  if  the  function  will 
always  diverge  when  applied  to  any  argument.  In  ordinary  Scott  domains,  there  is  no 
semantic  difference  between  the  function  which  always  diverges  given  any  argument,  and  a 
divergent  computation  of  functional  type.  Lifted  domains  repair  this  deficiency.  Bloom  and 
Riecke  have  achieved  a  close  correspondence  between  operational  and  denotational  semantics 
for  this  setting,  and  are  investigating  axiom  systems. 

Avrim  Blum 

Blum  has  been  working  in  two  main  areas  this  past  year  and  has  also  finished  his  Master’s 
thesis  [55]  under  Rivest’s  supervision. 

He  continued  his  work  with  Rivest  on  problems  in  computational  learning  theory — in  partic¬ 
ular,  computational  complexity  issues  in  the  training  of  neural  networks.  One  result  of  this 
work  is  a  proof  that  training  a  very  simple  neural  network  with  only  three  computational 
nodes  is  NP-complete.  This  work  was  presented  at  the  NIPS  and  COLT  conferences  [57j. 

Blum  has  also  been  working  on  approximate  graph  coloring.  The  3-coloring  problem  is  one 
of  the  most  well  known  NP-complete  problems,  but  there  is  an  enormous  gap  between  the 
results  achieved  by  the  best  approximation  algorithms  for  this  problem  and  the  best  lower 
bounds  known.  Blum  devised  a  new  approximation  algorithm  [56]  that  reduced  this  gap 
somewhat  and  introduced  different  techniques  for  attacking  this  problem. 

Thomas  H.  Cormen 

Connen  continued  his  work  on  the  textbook  Introduction  to  Algorithms  with  Leiserson  and 
Rivest.  He  plans  Lo  start  working  on  parallel  computing  research  over  the  summer. 

Lenore  Cowen 

Cowen  continues  work  with  Goldwasser  on  tw'o  areas:  key  exchange  protocols  and  information 
theoretic  properties  of  private  functions. 

Claude  Crepeau 

Crepeau’s  current  research  interest  is  mainly  the  study  of  two-party  cryptographic  pro¬ 
tocols.  His  earlier  study  of  disclosure  protocols  [63]  [64]  evolved  in  a  series  of  results 
[86||89j|88['177![87]  essentially  stating  that  very  complex  two-party  protocols  known  as  fair 
oblivious  circuit  evaluation  (see  [87]  for  definition)  can  be  achieved  from  very  simple  devices. 
Such  a  device  can  be  a  simple  noisy  chanlel,  for  instance.  Another  such  possible  device  fol¬ 
lows  the  lines  of  Bennett  and  Brassard  ana  rely  on  the  correctness  of  quantum  physics.  This 
work  was  accomplished  in  part  while  Crepeau  was  visiting  Aarhus  University  (Denmark)  in 
the  summer. 

Crepeau  is  currently  completing  his  Ph.D.  thesis,  that  will  cover  some  recent  material  selected 
from  the  above  paper.s.  He  is  expected  to  defend  his  thesis  during  the  summer. 
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Zero-knowledge  protocols  is  another  of  Crepeau’s  favorite  research  topics.  While  visiting 
IBM  Almaden  Research  Center  last  summer,  he  contributed  two  papers  on  this  subject 
;65][62].  These  two  papers  are  foUowup  to  [61],  in  the  fact  that  they  are  concerned  with  a 
model  where  the  prover  involved  in  the  protocol  is  computationally  bounded. 

Aditi  Dhagat 

During  fall  1988,  Dhagat  was  a  teaching  assistant  for  the  graduate  course  in  theory  of  com¬ 
putation  taught  by  Sipser.  During  the  year,  she  worked  with  Sipser  in  complexity  theory 
and  cryptography,  trying  to  construct  a  pseudorandom  number  generator  secure  against 
monotone  circuits  without  any  unproven  assumptions.  In  the  process,  they  looked  ai  mono¬ 
tone  statistical  tests  and  showed  that  there  exist  exponential  size  monotone  statistical  tests 
which  break  the  security  of  the  Nisan-Wigderson  generator  based  on  parity.  They  have  also 
siiown  that  if  there  exist  monotone  functions  which  are  hard  to  approximate  for  polynomial 
size  monotone  circuits,  then  there  exist  pseudorandom  number  generators  secure  against 
polynomial  size  monotone  circuits. 

Dhagat  plans  to  continue  to  work  on  this  question  during  the  summer  of  1989. 

Michael  Ernst 

Ernst  became  a  graduate  student  at  MIT  in  January  1989.  He  worked  under  Meyer’s  su¬ 
pervision  to  prove  a  monotone  model  adequate  for  recursive  program  schemes.  In  order 
to  prove  adequacy,  most  proofs  in  the  literature  directly  use  a  stronger  continuous  model 
which  simplifies  the  proof  and  which  implies  the  weaker  result;  the  typical  approach  is  via 
Tait’s  method  of  computability  [265][296].  The  introduction  of  continuity  is  poorly  moti¬ 
vated  from  an  expository  and  pedagogical  viewpoint;  we  would  hope  to  be  able  to  show  the 
result  directly  [233]. 

Ernst  and  Meyer  [100]  found  that  this  vras  not  possible;  although  they  were  able  to  produce 
a  clear  exposition  of  the  concept,  at  one  crucial  point  continuity  was  required.  While  the 
result  holds  for  the  monotone  model  without  mention  of  continuity,  a  weaker  assumption  of 
monotonicity  in  the  proof  leads  to  a  failure  of  the  result. 

Ernst  spent  much  of  1989  finishing  up  his  undergraduate  requirements;  he  plans  to  get 
started  on  his  S.M  thesis  during  the  upcoming  year. 

Lance  J.  Fortnow 

Working  with  Sipser,  Fortnow  examined  the  relationship  between  probabilistic  polynomial 
time  ;u\d  prohahilistic  linear  i  ime.  They  showed  [1 12]  the  existence  of  an  oracle  under  which 
tlie  two  classes  are  identical.  1  his  result  means  the  techniques  of  separating  the  deterministic 
and  nondeterministic  time  hierarchies  will  not  work  for  probabilistic  computation.  They  also 
show  many  other  results  relating  to  probabilistic  computation  and  linear  time. 

During  the  spring  (jf  1989,  fortnow  spent  the  semester  writing  his  thesis  [111]  and  looking 
for  a  job. 
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Jeff  Fried 

Fried  continued  research  on  the  architecture,  design,  and  analysis  of  communication  networks 
for  use  in  parallel  computers  and  telecommunications.  He  completed  a  Master’s  thesis  [117], 
supervised  by  Leiserson,  which  includes  two  switch  designs  for  such  networks  [120][116]. 
Followup  work  in  this  area  has  included  an  improved  circuit  design  for  one  of  the  VLSI 
functions  used  in  these  designs  [118],  and  a  study  of  some  of  the  modularity  tradeoffs  found 
in  sparse  circuit-switched  interconnection  networks  [119]. 

Fried  is  currently  working  on  a  number  of  problems  related  to  the  architecture  and  control 
algorithms  needed  for  high  performance  communication  networks.  This  work  includes  a 
study  of  the  impact  of  synchrony  on  the  performance  of  distributed  algorithms,  and  design 
studies  of  a  VLSI  packet  router  for  use  in  broadband  networks  [115]. 

Sally  A.  Goldman 

Goldman  has  been  working  with  Rivest  on  studying  learning  algorithms  for  concepts  that 
have  polynomial  sized  instance  spaces  [129][130].  They  have  focused  on  polynomial  prediction 
algorithms  in  which  the  learner  predicts  a  value  for  each  entry  in  the  instance  space  and 
then  receives  feedback  as  to  whether  the  prediction  was  correct.  They  consider  the  worst 
case  mistake  bounds  under  several  models  for  the  selection  of  the  instances.  Often,  good 
mistake  bounds  are  obtained  by  the  halving  algorithm.  They  discuss  an  approximate  halving 
algorithm  and  show  how  a  fully  polynomial  randomized  approximation  schemes  can  be  used 
to  implement  (with  high  probability)  the  approximate  halving  algorithm.  They  demonstrate 
these  techniques  on  the  problem  of  learning  a  total  order  on  a  set  of  n  elements. 

Goldman  has  also  been  working  with  Rivest  and  Schapire  on  the  particular  problem  of 
learning  a  binary  relation  between  n  objects  of  one  kind  and  m  of  another  [130].  This  can 
be  viewed  as  the  problem  of  learning  an  n  x  m  binary  matrix.  Here,  the  instance  space 
contains  the  elements  of  the  matrix  and  is  thus  of  polynomial  size.  They  present  numerous 
upper  and  lower  bounds  on  the  number  of  mistakes  that  prediction  algorithms  can  make 
under  different  models  for  the  selection  of  the  instances. 

Goldman  has  also  done  some  research  in  the  field  of  computational  geometry.  In  particular, 
she  developed  an  algorithm  to  compute  the  greedy  triangulation  of  an  arbitrary  point  set 
that  takes  0(n‘lgn)  time  and  0{n)  space  [128].  In  January,  Goldman  participated  in  the 
robot  building  project  lead  by  Schapire. 

Ronald  I.  Greenberg 

Greenberg  worked  on  three  main  topics  during  the  past  year:  networks  for  general  ])ur- 
pose  parallel  computation,  multi-layer  channel  routing,  and  bounds  on  the  area  for  VLSI 
implementations  of  finite-state  machines. 

Recent  w'ork  on  networks  for  general  purpose  parallel  computation  is  reported  in  [1361. 
d’his  paper  provides  several  extensions  and  generalizations  of  earlier  wmrk  on  the  problem  o1 
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designing  “universal”  networks  which  can  simulate  any  other  network  of  comparable  physical 
size  with  only  polylogarithniic  overhead  in  simulation  time. 

On  the  topic  of  multi-layer  channel  routing,  Greenberg  has  been  seeking  improvement.s  upon 
algorithms  recently  developed  with  Ishii  and  Sanglovanni-Vincentelli  (University  of  Califor¬ 
nia,  Berkeley)  for  the  program  MulCh  [137].  The  basic  approach  of  MulCh  is  to  divide 
a  multi-layer  problem  into  essentially  independent  subproblems  of  one,  two,  or  three  lay¬ 
ers.  A  main  step  in  MulCh  is  to  greedily  partition  the  nets  once  a  set  of  layer  groups  has 
been  determined.  As  each  net  is  considered,  it  is  assigned  to  the  group  where  the  resulting 
subproblem  seems  to  be  the  one  requiring  the  least  channel  width.  For  testing  the  required 
channel  width  of  single-layer  partitions,  Greenberg  and  Miller  Maley  (Princeton  University) 
have  devised  algorithms  which  are  more  efficient  than  naive  approaches  involving  complete 
routing  of  the  layer.  Greenberg  is  also  developing  “incremental”  algorithms  to  quickly  de¬ 
termine  the  effect  on  certain  subproblem  characteristics  when  a  new  net  is  added,  by  taking 
advantage  of  knowledge  derived  from  earlier  computations  on  the  subproblem. 

Finally,  Greenberg  and  Mike  Poster  (Columbia  U.  and  NSF)  have  derived  lower  bounds  on 
the  area  required  for  VLSI  layout  of  finite-state  machines  [113].  These  lower  bounds  show 
that  naive  layout  approaches  are  optimal  in  the  worst  case. 

Michelangelo  Grigni 

Grigni  is  a  third  year  graduate  student  supervised  by  Sipser.  His  thesis  research  considers  the 
construction  of  fast  robust  broadcasting  networks,  continuing  work  begun  with  Peleg  [139] 
of  the  VVeizmann  Institute.  Current  work  with  Bertsimas  of  the  Sloan  School  extends  their 
recent  result  [50]  on  the  suboptimality  of  the  space-filling  curve  heuristic  for  the  Euclidean 
TSP  problem.  Other  work  with  Bertsimas  includes  a  survey  of  various  :/^NP- complete  prob¬ 
lems.  Grigni  continues  searching  for  new  attacks  on  the  matrix  multiplication  exponent 
problem. 

Carolyn  M.  Haibt 

Haibt  spent  most  of  the  year  on  coursework,  but  also  continued  work  with  Tardos.  They 
are  currently  working  on  algorithms  for  the  generalized  network  flow  problem.  This  is  a 
generalization  the  maximum  flow  problem,  where  each  edge  has  an  associated  gain  factor, 
and  flow  is  multiplied  by  this  factor  when  it  passes  through  an  edge. 

Mark  D.  H  arisen 

Hansen  has  be»ui  studying  ,L',r;i[)h  embeddings  with  applications  to  parallel  processing  jirob- 
lems.  In  153  ,  he  examines  tlu'  problem  of  finding  optimal  geometric  embeddings  in  the 
[ilane  and  higher  dimensional  spaces.  Given  an  undirected  graph  G  with  n  vertices.  aiid  a 
■et  /’  ol  7?  poui!  in  //“,  tlie  i/ro.’/u  trie  t  niheddinjf  prohhm  consists  of  finding  a  bijection  rrom 
the  vertices  oi  ( /  to  the  jioinis  m  t  1h'  plane  which  minimizes  the  sum  total  of  edge  leiigihs  of 
the  eiiibeflded  grajih.  In  general,  this  preddem  is  A'/'-complete  as  it  contains  the  F'n-  i;de,ir 
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traveling  salesman  problem  as  a  special  case.  Hansen  gives  approximation  algorithms  for 
embedding  many  of  the  important  graphs  studied  in  the  theory  of  parallel  computation.  He 
presents  fast  algorithms  for  embedding  d-dimensional  grids  in  the  plane  which  are  within 
a  factor  of  O(logn)  times  optimal  cost  for  d  >  2  and  O(log^n)  for  d  =  2.  He  also  shows 
that  any  embedding  of  a  hypercube,  butterfly,  or  shuffle  exchange  graph  must  be  within  an 
(9(log  n)  factor  of  optimal  cost.  When  the  points  of  P  are  randomly  distributed  or  arranged 
in  a  grid,  he  is  able  to  use  the  results  of  Leighton  and  Rao  [202]  to  give  a  polynomial  time 
algorithm  which  can  embed  arbitrary  weighted  graphs  in  these  points  with  cost  within  an 
(^(log-  n)  factor  of  optimal. 

Hansen  shows  how  the  algorithms  developed  in  [153]  for  geometT.c  embeddings  can  be  used 
to  give  solutions  which  are  within  an  0(\og^  N)  factor  of  optimal  to  problems  of  performance 
optimization  for  array-based  parallel  processors  in  the  following  areas;  communication  load 
balancing,  dynamic  allocation  of  jobs  to  processors,  reconfiguring  around  faults,  and  sim¬ 
ulating  other  architectures.  He  also  indicates  some  applications  to  wafer  scale  integration 
problems  and  the  dynamic  configuration  of  distributed  computing  networks. 

Working  with  Leighton,  Hansen  was  able  to  apply  some  of  the  techniques  developed  in  [153] 
to  give  an  time  algorithm  for  solving  the  Euclidean  traveling  salesman  problem.  The 

previous  best  running  time  for  this  algorithm  was  (9(logW2^).  A  year  earlier  Smith  [293] 
independently  gave  an  algorithm  with  the  same  running  time,  using  different  techniques 
involving  the  Lipton-Tarjan  planar  separator  theorem  [211].  Hansen  and  Leighton  are  cur¬ 
rently  investigating  the  possibility  of  developing  practical  heuristics  for  solvdng  Euclidean 
TSP  using  the  ideas  in  these  two  algorithms. 

Alexander  T.  Ishil 

Ishii  completed  his  Master’s  thesis  [168],  which  describes  his  models  for  VLSI  timing  analysis. 
The  model  maps  continuous  data  domains,  such  as  voltage,  into  discrete,  or  digital,  data 
domains,  while  retaining  a  continuous  notion  of  time.  The  majority  of  the  thesis  concentrates 
on  developing  lemmas  and  theorems  that  can  serve  as  a  set  of  “axioms”  when  analyzing 
algorithms  based  on  the  model.  Key  axioms  include  the  fact  that  circuits  in  our  model 
generate  only  well  tlefined  liigital  signals,  and  the  fact  that  components  in  our  model  support 
and  accurately  handle  the  “undefined”  values  that  electrical  signals  must  take  on  when  they 
make  a  transition  Iretween  valid  logic  levels.  In  order  to  facilitate  proofs  for  circuit  properties, 
the  class  of  cnnijnitatioiial  pir.licafes  is  defined.  A  circuit  property  can  be  proved  by  simply' 
casting  the  property  as  a  computational  predicate. 

Ishii  has  also  been  working  with  (i'r<*enberg  and  .Sangiovanni- Vincentelli  of  Berkeley  on  a 
m\ilti-layer  chanin'l  router  h’r  VLSI  eircuits,  cr’led  MulCh  (I37j.  While  based  on  the 
Cha.M KLF:o.\  sy.stem  developed  at  Bcrl  eley,  MulCJI  incorporates  the  additional  feature 
that  nets  may  he  routed  en'j'-ely  on  a  .si.ugle  interconnect  layer  (Ch.AMELEON  requires  the 
vertical  and  hori/onfa!  sections  of;:  net  be  routed  on  different  interconnect  layers).  When 
used  citi  sample  prolilems.  Itlri.Cli  .''iiuw..  significant  improvements  over  CHAMELEON  in 
area,  total  wire  length,  and  via  count. 
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Ishii  continued  work,  begun  with  Maggs,  on  a  new  VLSI  design  for  a  high  speed  multiport 
register  file.  Design  goals  include  short  cycle  time  and  single-cycle  register  window  context 
changes.  This  research  began  as  an  advanced  VLSI  class  project,  under  the  supervision  of 
Knight  of  the  MIT  Artificial  Intelligence  Laboratory. 

Lalita  A.  Jategaonkar 

Jategaonkar  has  been  working  jointly  with  Meyer  on  further  developing  research  begun  last 
year  at  Bell  Laboratories  with  Mitchell.  In  [170],  Jategaonkar  and  Mitchell  develop  an 
extension  of  the  programming  language  ML  in  which  a  restricted  object-oriented  style  can 
be  achieved.  In  keeping  with  the  framework  of  ML,  a  type  derivation  system  and  a  type 
inference  algorithm  is  presented.  It  is  proved  that  the  algorithm  is  sound  and  complete 
with  respect  to  the  type  derivation  system,  and  that  it  infers  a  most  general  typing  of  every 
typeable  expression  in  the  language.  This  research  will  comprise  Jategaonkar’s  forthcoming 
Master’s  thesis. 

In  order  to  show  that  the  type  derivation  system  is  “reasonable”  in  a  precise,  technical 
sense.  Jategaonkar  and  Meyer  have  been  developing  an  interpreter  for  this  language.  They 
aim  to  show  that  the  interpreter  satisfies  certain  desirable  properties,  and  that  the  interpreter 
and  the  type  derivation  are  well  matched  in  the  sense  that  no  typeable  expression  in  the 
language  reduces  to  a  type  error.  Jategaonkar  is  also  interested  in  further  extending  ML  to 
support  subtyping  of  abstract  types  and  recursive  types.  Another  direction  of  research  she 
is  interested  in  pursuing  is  to  develop  a  semantics  for  these  extensions  of  ML. 

Trevor  Jim 

Jim  entered  the  department  in  September  1988.  His  previous  work  with  Appel  [16]  on  a 
novel  code  generator  for  the  language  ML  was  presented  at  POPL  ’89  in  January. 

Under  the  direction  of  Meyer,  he  has  been  studying  the  work  of  Berry  and  Curien  [90][49] 
on  models  of  PCF  [265]  based  on  stable  functions  and  sequential  algorithms.  These  models 
were  developed  as  alternatives  to  the  standard  model,  which  contains  troublesome  “non- 
seciuential”  elements.  Jim  is  trying  to  find  extensions  of  PCF  for  which  the  alternate  models 
are  fully  abstract. 

Joe  Kilian 

Kilian  spent  most  of  his  time  working  on  his  thesis,  “Randomness  in  Algorithms  and  Proto¬ 
cols”  [176.  which  he  recently  completed.  He  also  did  some  work  in  efficient  zero-knowledge 
interactive  proofs,  bounded  interaction  zero- knowledge  proofs,  noninteractive  zero-knowledge 
proofs,  multi-prover  zero-knowledge  proofs,  space- bounded  secure  protocols,  communication 
lower  bounds  for  secret  sharing,  an  d  IP  v.s.  AM. 

A  troubling  issue  in  theoretical  cryptography  is  the  chasm  between  what  is  efficient  in  theory 
and  what  is  eflicient  in  practice.  Otie  area  in  which  this  gap  is  particularly  large  is  in  zero- 
knowledge  |)roc,t.  for  XP  |)red;(ates.  .Suppose  one  wishes  to  prove  in  zero-knowledg.  ih.io 
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some  circuit,  ( '(xj , . .  . ,  ),  is  satisfiable.  The  previously  most  efficient  solutions  to  this 

problem  (!61j[167j)  required  the  prover  and  the  verifier  to  send  0(A:1C'|)  bits  back  and  forth 
per  iteration  of  the  protocol.  Here,  jC'!  denotes  the  number  of  gates  in  the  circuit  C,  and  k 
denotes  the  security  parameter.  Using  pseudorandom  generators,  Kilian  [177]  has  exhibited  a 
p.otocol  in  which  the  prover  and  the  verifier  communicate  only  0{\C\  -\-k^)  bits  per  iteration 
of  the  protocol.  In  real  life  circumstances,  |C'l  is  likely  to  be  very  large,  in  which  case  this 
protocol  should  behave  better  in  practice  as  well  as  in  theory. 

Zero- knowledge  proofs  typically  require  a  great  deal  of  interaction  between  the  prover  and 
the  verifier.  It  is  of  both  theoretical  and  practical  interest  to  see  how  much  interaction  is 
truly  needed,  which  led  to  the  notions  of  hounded  interactive  protocols  and  noninteractive 
protocols  with  a  common  random  string.  In  bounded  interaction  protocols,  the  prover  and 
the  verifier  interact  for  time  polynomial  in  the  security  parameter.  After  the  interaction 
phase,  the  prover  proves  theorems  to  the  verifier  by  sending  him  a  letter  in  the  mail.  In  a 
noninteractive  protocol  with  a  common  random  string,  the  prover  and  verifier  do  not  interact 
at  all,  but  are  both  presented  with  a  uniformly  distributed  string  of  length  polynomial  in 
the  security  parameter. 

Prior  to  Kilian’s  work,  there  existed  three  proposed  protocols  for  these  scenarios,  due  to 
Blurn-Feldman-Micali  [58],  De  Santis-Persiano-Micali  [93],  and  Micafi-Ostrovsky  [250]. 

Kilian  developed  a  very  simple  and  efficient  protocol  for  bounded  interaction  zero-knowledge 
proofs,  and  a  provably  secure  protocol  for  noninteractive  zero-knowledge  with  a  common 
random  string.  Both  of  these  protocols’  security  is  based  on  reasonable  cryptographic  as¬ 
sumptions.  His  protocol  for  bounded  interaction  zero- knowledge  proofs  is  more  communica¬ 
tion  efficient  than  the  best  previously  known  interactive  zero-knowledge  protocols.  In  both 
of  these  protocols,  the  prover  can  prove  polynomially  many  polynomial-sized  theorems. 

In  [42],  Kilian  along  with  Ben-Or,  Coldwasser,  and  Wigderson,  developed  a  multiprover 
generalization  of  interactive  proof  systems.  They  showed  that,  informally,  anything  two 
provers  could  prove,  they  could  prove  in  statistical  zero-knowledge.  Recently,  Kilian  has 
strengthened  this  result,  showing  that  anything  two  provers  could  prove,  they  could  prove 
in  perfect  zero-knowledge. 

With  Nisari,  Kilian  applied  knowledge  complexity  notions  from  cryptography  to  space- 
bounded  automata  [179].  They  developed  protocols  in  this  scenario  for  a  number  of  cryp¬ 
tographic  protocols:  secret  key  exchange,  bit-commital,  secure  circuit  evaluation,  and  zero- 
knowledge  proofs.  In  the  spare-bounded  scenario,  the  security  of  these  protocols  may  be 
proven  without  any  as.sumptions  whatsoever.  F'urthermore,  these  protocols  are  robust  against 
adversaries  who  have  asynij/totically  more  space  than  used  by  the  good  players. 

Nisan  and  Kilian  also  investigated  upper  and  lower  bounds  for  secret  sharing.  They  consider 
schemes  in  which  a  bit  h  is  siiared  among  n  players,  such  that. 


1.  A  majority  of  the  n  players  can  reconstruct  6;  and 
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2.  A  nonmajority  of  the  players  cannot  reconstruct  any  information  about  b. 


They  show  a  lower  bound  of  fl(nlog7i)  cn  the  total  number  of  bits  that  must  be  distributed 
amongst  the  n  players.  They  also  consider  a  weakened  form  of  secret  sharing,  in  which  2n/3 
players  can  reconstruct  b,  and  n/3  players  learn  nothing.  They  use  coding  theory  to  prove 
the  existence  of  secret  sharing  schemes  that  are  more  efficient  than  the  lower  bounds  proven 
for  the  more  stringent  conditions. 

A  classic  theorem  of  Goldwasser  and  Sipser  [135]  states  that  IP=AM.  In  other  words,  public 
coins  are  as  powerful  as  private  coins  for  interactive  proof  systems.  Kilian  found  a  very 
simple  proof  of  this  fact,  using  random  selection  techniques  from  [133].  This  proof  will  be 
included  in  a  paper  on  random  selection,  with  Oded  Goldreich,  Johan  Hastad,  and  Yishay 
Mansour. 

Shlomo  Kipnis 

Kipnis  has  been  investigating  parallel  architectures  and  interconnection  networks.  He  is 
trying  to  further  explore  the  power  of  bussed  interconnection  schemes  in  routing  permuta¬ 
tions  and  realizing  various  communication  patterns.  Bussed  interconnection  schemes  and 
their  relation  to  difference  covers  was  explored  by  Kilian,  Kipnis,  and  Leiserson  in  [178].  In 
addition,  he  is  investigating  various  arbitration  schemes  for  bussed  based  architectures. 

Recently,  he  studied  the  problem  of  range  queries  in  computational  geometry.  Range  queries 
is  a  fundamental  problem  in  computational  geometry  with  applications  to  computer  graphics 
and  database  retrieval  systems.  He  compiled  a  survey  report  on  three  different  methods  for 
range  queries  in  computational  geometry  [180]. 

Richard  R.  Koch 

Koch’s  Ph.D.  thesis  [183]  is  a  probabilistic  analysis  of  routing  on  a  parallel  architecture. 
Koch  analyzes  the  bandwidth  of  the  butterfly  network.  In  a  dilated  butterfly  network,  nodes 
arc  connected  by  parallel  edges  instead  of  just  one  edge  as  in  the  usual  butterfly  network. 
He  proves  a  previous  conjecture  that  the  expected  bandwidth  of  an  N  node  dilated  butterfly 

_  I 

network  is  0(A^(logiV)  «),  where  q  is  the  number  of  parallel  edges.  He  explores  some 
implications  of  his  results  for  design  tradeoffs.  He  also  develops  interesting  techniques  for 
Hading  asymptotics  for  nonlinear  systems  of  recurrences  and  many  of  the  results  appeared 
in  ;l82l. 

In  [184]  Koch,  Leighton,  Maggs,  Rao,  and  Rosenberg  study  the  problem  of  emulating  Tq 
steps  of  an  .Vr;-node  guest  network  on  an  A^^-node  host  network.  Although  many  isolated 
emulation  results  have  been  proved  for  specific  networks  in  the  past,  and  measures  such  as 
dil.afion  and  congestion  were  known  to  be  important,  the  field  has  lacked  a  model  within 
which  general  results  and  meauiugful  lower  bounds  can  be  proved.  They  attempt  to  provide 
such  a  model,  along  with  corresponding  general  techniques  and  specific  results  in  this  paper. 
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Dina  Kravets 

Kravets  spent  most  of  the  year  working  with  Aggarwal  and  Park  on  problems  in  compu¬ 
tational  geometry.  In  January,  she  finished  her  Master’s  thesis  [191]  which  included  the 
following  results: 


1.  An  algorithm  to  find  all  the  farthest  neighbors  of  every  vertex  on  a  convex  n-gon  in 
0(n)  time. 

2.  An  0{n")  algorithm  to  sort  the  distances  of  all  the  vertices  of  a  convex  n-gon  with 
respect  to  each  vertex  of  the  convex  n-gon. 

3.  An  0{kjilog  k)  time  algorithm  to  find  k  farthest  vertices  for  every  vertex  of  a  convex 
71,-gon. 

4.  A  worst-case  optimal  algorithm  to  sort  a  set  of  numbers  given  lower  bounds  on  the 
ranks. 


The  first  of  these  algorithms  appeared  in  the  Information  Processing  Letters  [12].  Park  and 
Kravets  are  planning  to  improve  the  third  result  and  submit  it  to  the  ACM-SIAM  Symposium 
on  Discrete  Algorithms. 

Kravets  is  also  looking  at  some  problems  in  parallel  computation  and  VLSI  with  Leighton. 
Leonid  A.  Levin 

The  topic  of  Levin’s  research  in  1988-89  may  be  called  “Randomness  in  Computing.”  In 
[206],  Levin  and  Venkatesan  propose  the  first  intractability  results  for  random  instances 
of  NP  problems.  NP-complete  problems  should  be  hard  on  some  (maybe  extremely  rare) 
instances.  Generic  instances  of  many  such  problems  proved  to  be  easy.  This  paper  shows 
the  intractability  of  random  instances  of  a  graph  coloring  problem.  Applications  of  average 
case  intractability  are  considered  in  two  other  papers:  [132][166]. 

Blum  and  Micali  [59]  discovered  permutations  /  with  “hard-core”  predicates  b{x)  that  can¬ 
not  be  efficiently  guessed  from  f{x)  with  a  noticeable  correlation.  Both  6,/  are  easy  to 
compute.  Yao  [314]  modifies  any  one-way  permutation  /  into  /*  which  has  a  hard-core 
predicate.  Its  security  may  be  lower  than  any  constant  power  of  the  security  of  /  and  is  too 
small  for  practical  applications.  GoMreich  and  Levin  [132]  prove  that  most  linear  predicates 
are  hard-cores  for  every  one-way  function  and  have  almost  the  same  security.  The  result 
extends  to  multiple  (up  to  the  logarithm  of  security)  hidden  bits  and  has  wide  applicability 
to  pseudorandomness,  cryptography,  etc. 

Let  an  easily  computable  function  /  be  one-way,  i.e.,  for  most  x  one  cannot  recover  from  f{x) 
either  ( 1 )  r  by  a  polynomial  time  algorithm,  or  (2)  an  x'  G  f~^{f{^))  by  a  polynomial  size 
circuit.  In  case  (1),  to  exclude  useless  f(x)  =  0,  the  difference  between  Shannon  entropies  of 
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inputs  and  outputs  of  /  is  restricted  to  0(1).  Impagliazzo,  Levin,  and  Luby  [166]  show,  based 
on  [132],  that  the  existence  of  one-way  functions  in  the  sense  (1)  and  (2)  is  necessary  and 
sufficient  for  the  existence  of  pseudo-random  generators  secure  against  feasible  algorithms 
or  circuits,  respectively. 

In  [205],  Levin  compares  probability  distributions  of  computational  objects.  The  usual  distri¬ 
butions  are  concentrated  on  strings  that  differ  little  in  any  fundamental  characteristic,  except 
their  informational  size  (Kolmogorov  complexity).  This  property  distinguishes  a  class  of  ho¬ 
mogeneous  probability  measures  suggesting  various  applications.  In  particular,  it  explains 
why  the  average  case  NP-completeness  results  are  so  measure  independent,  and  offers  their 
generalization  to  this  wider  and  more  invariant  class  of  measures.  It  also  demonstrates  a 
sharp  difference  between  pseudo-random  strings  and  the  objects  known  before. 

Bruce  Maggs 

Maggs  is  studying  the  ability  of  a  host  network  to  emulate  a  possibly  larger  guest  network 
il84h  His  collaborators  in  this  research  are  Koch,  Tom  Leighton,  Rao,  and  Rosenberg.  An 
emulation  is  work-preserving  if  the  work  (processor-time  product)  performed  by  the  host  is 
at  most  a  constant  factor  larger  than  the  work  performed  by  the  guest.  Such  an  emulation  is 
efficient  because  it  achieves  optimal  speedup  over  a  sequential  emulation  of  the  guest.  Many 
work-preserving  emulations  for  particular  networks  have  been  discovered.  For  example,  the 
jV-node  butterfly  can  emulate  an  log  TV  node  shuffle-exchange  graph  and  vice  versa.  On 
the  other  hand,  a  work-preserving  emulation  may  not  be  possible  unless  the  guest  graph  is 
much  larger  than  the  host.  For  example,  a  linear  array  cannot  perform  a  work-preserving 
emulation  of  a  butterfly  unless  the  butterfly  is  exponentially  larger  than  array.  These  positive 
and  negative  results  provide  a  basis  for  comparing  the  relative  power  of  different  networks. 

Maggs  is  also  studying  algorithms  for  routing  packets  on  faulty  bounded-degree  networks. 
With  Leighton,  he  developed  a  scheme  toy  routing  N  packets  on  an  A-node  multibutterfly 
network  [303]  in  O(log  A)  steps  even  in  the  presence  of  many  faulty  nodes. 

Yishny'  Mansour 

Yishay  Mansour  has  contimied  studying  data  transmission  in  communication  networks.  In 
a  work  with  Schiebcr  [229],  they  show  lower  bounds  for  communication  over  non-FIFO 
links.  In  a  work  with  Herzberg  and  Goldreich  [131]  they  give  a  randomized  protocol  for 
communication  over  non-FIFO  links.  In  a  work  with  Awerbuch  and  Shavit  [31]  they  show 
how  to  achieve  polynomial  end-to-end  communication. 

In  work  with  Linial  and  Nisan  [208],  they  investigate  constant  depth  circuit  using  the  Fourier 
Irnnsform.  They  are  able  to  show  a  quasi-polynomial  time  algorithm  for  learning  this  class. 
.Another  work  that  is  connectfvl  to  learning  is  |40|. 

In  a  work  wuth  Schieber  and  liwari  [231],  they  continue  to  develop  techniques  to  prove  lowor 
omnd  for  integer  computations.  The  work  with  Schieber  and  Tiwari  [230]  tries  to  explore 
Tie  complexity  of  approximating  algebraic  functions.  In  this  work,  techniques  taken  from 
Approximation  I  heory  are  used  to  derive  lower  and  upper  bound. 


178 


Theory  of  Computation 


Mark  J.  Newman 

Newman  continued  work  on  fault- tolerant  strategies  for  parallel  computation.  With  Hastad 
and  Leighton  [156],  he  demonstrated  algorithms  for  reconfiguring  hypercubes  with  faulty 
components.  After  reconfiguration,  the  hypercubes  retain  all  computational  power  (within 
constant  factors).  The  algorithms  are  successfi.l  with  high  probability,  given  that  nodes 
and  edges  fail  independently  and  with  constant  probability.  They  also  showed  how  to  route 
permutations  on  hypercubes  even  if  a  constant  fraction  of  the  cube’s  components  have  failed. 

With  Leighton,  Ranade,  and  Schwabe  [201],  Newman  also  showed  how  a  dynamically  chang¬ 
ing  binary  tree  can  be  embedded  in  a  hypercube  so  that  computational  and  communication 
overhead  are  low.  Specifically,  they  produced  randomized  algorithms  which  embed  any  grow¬ 
ing  and  shrinking  binary  tree  so  that  the  resulting  simulation  requires  only  constant  factor 
overhead,  with  high  probabilit3^ 

Noam  Nisan 

Nisan  arrived  as  a  postdoc  in  the  theory  group  in  January  1989.  He  has  been  working  mainly 
on  problems  related  to  complexity  theory. 

Together  with  Babai  and  Szegedy  [33],  he  proved  lower  bounds  for  the  multiparty  com¬ 
munication  complexity  of  certain  simple  functions.  These  bounds  were  used  to  obtain  a 
pseudorandom  generator  for  Logspace  without  relying  on  any  unproven  assumptions. 

In  [245],  Nisan  obtained  a  full  characterization  of  the  parallel  time  needed  to  compute  any 
boolean  function  on  a  CREW  PRAM  in  terms  of  the  function’s  decision  tree  complexity. 

In  joint  work  with  Linial  [210],  the  question  of  obtaining  approximate  versions  of  the 
inclusion-exclusion  formula  is  tackled.  Tight  upper  and  low’er  bounds  are  proved  for  sev¬ 
eral  formulations  of  this  question. 

Nisan  and  Kiban  [179]  considered  cryptographic  protocols  in  the  setting  where  all  parties 
are  space-bound.  In  this  setting,  they  design  secure  protocols  for  a  wide  spectrum  of  crypto¬ 
graphic  problems.  The  security  of  these  protocols  is  proved  without  relying  on  any  unproven 
assumptions. 

In  his  joint  work  with  Linial  and  Mansour  [208],  constant  depth  circuits  are  studied  in  terms 
of  their  Furier  transform.  It  is  shown  that  almost  all  of  the  power  spectrum  of  a  function  in 
AC°  lies  in  the  low  coefficients.  This  fact  is  used  to  obtain  a  learning  algorithm  for  constant 
depth  cirniifs,  ns  well  as  several  others  results. 

Marios  C.  Papaefthymiou 

Papaefthymiou  began  his  '^mdies  as  a  graduate  student  at  .MIT  in  September  1988.  He  is 
working  on  his  SM  thesis  under  the  supervision  of  Leiserson. 
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His  research  focuses  on  the  design  of  efficient  algorithms  for  pipelining  of  combinational 
circuitry.  A  general  framework  for  this  problem  has  been  given  by  Leiserson  and  Saxe  [204]. 

Papaefthyiiiiou  has  given  an  0{E)  optimal  algorithm  for  minimum  latency  pipelining  of 
combinational  circuitry  with  constrained  clock  period.  He  also  investigates  methods  for 
pipelining  combinational  circuitry  u^ing  minimum  number  cf  registers. 

James  K.  Park 

James  K.  Park  spent  most  of  the  last  year  collaborating  with  Aggarwal  (IBM,  Yorktown 
Heights)  and  Kravets  on  a  number  of  problems  relating  to  totally  monotone  arrays.  Park’s 
work  with  .4ggarw’al  (described  in  [13][14],  and  another  manuscript  “Parallel  Searching  in 
Multidimensional  Monotone  Arrays,”  currently  in  preparation)  centers  on  the  problem  of 
finding  maximum  entries  in  totally  monotone  arrays  and  appHcations  of  efficient  sequential 
and  parallel  algorithms  for  this  problem  to  problems  in  computational  geometry,  dynamic 
programming,  string  matching,  and  VLSI  river  routing.  This  work  generalizes  and  extends 
the  results  of  [11].  (Park’s  Master’s  thesis  [254],  finished  in  January,  is  also  on  this  sub¬ 
ject.)  Park’s  work  with  Kravets  (described  in  Kravets’  Master’s  thesis  [191])  considers  two 
more  comparison  problems — sorting  and  computing  order  statistics — in  the  context  of  totally 
monotone  arrays  and  applications  of  efficient  solutions  to  these  problems. 

In  the  coming  year,  Park  plans  to  continue  his  research  relating  to  totally  monotone  arrays 
and  computational  geometry. 

Cynthia  A,  Phillips 

Phillips  developed  an  (9(lg^n)-time  {n  +  e)/ Ig  7i-processor  deterministic  parallel  algorithm 
to  contract  general  n-node,  e-edge  graphs  to  a  single  node.  This  algorithm  is  used  as  a 
subroutine  in  an  algorithm  develuped  with  Leiserson  to  contract  n-node  bounded-degree 
graphs  in  0(lgn  -f  lg^7)  time  with  high  probability  where  7  is  the  maximum  ,'enus  of 
any  connected  component.  A  deterministic  version  runs  in  time  0(lgnlg*  n  +  Ig^  7).  The 
algorithm  for  bounded-degree  graphs  uses  n/  Ig  n  processors  [262].  The  contraction  algorithm 
can  be  used  to  solve  the  connected-components,  biconnected-components,  and  spanning-t.ree 
proVjlems. 

I’hillips,  with  Zenios  of  the  I’niversity  of  Pennsylvania,  completed  a  preliminary  experi¬ 
mental  study  of  the  solution  of  large  assignment  problems  on  the  Connection  Machine  (TM) 
multiprocessor.  The  assignment  problem  is  also  known  as  maximum-weight  bipartite  match¬ 
ing.  They  developed  heuristics  to  improve  sequ<  ntial  “tail”  behavior  which  seems  to  limit 
the  usefulness  of  many  current  parallel  algorithms  for  the  assignment  problem  and  related 
flow  problems  ;263j. 

Phillips  will  be  w’riting  her  the^is  this  summer.  .Among  the  new  research  that  will  probably 
be  included  is  an  analysis  of  the  |)erniutation  distribution  of  the  Benes  network.  In  other 
words,  how  many  distinct  ways  (  an  the  swatches  of  a  Benes  network  be  set  to  yield  a  given 
permutation?  If  the  permutations  are  well  distributed,  then  pseudorandomly  setting  the 
switches  of  a  Henes  network  inn\  yield  a  good  pseudorandom  permutation  network. 
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Satish  B.  Rao 

In  [184]  Koch,  Leighton,  Maggs,  Rao  and  Rosenberg  study  the  problem  of  emulating  Tq 
steps  of  an  iVc-node  guest  network  on  an  Ni{-node  host  network.  Although  many  isolated 
emulation  results  have  been  proved  for  specific  networks  in  the  past,  and  measures  such  as 
dilation  and  congestion  were  known  to  be  important,  the  field  has  lacked  a  model  within 
w'hich  general  results  and  meaningful  lower  bounds  can  be  proved.  They  attempt  to  provide 
such  a  model,  along  with  corresponding  general  techniques  and  specific  results  in  this  paper. 

Leighton  and  Rao  have  developed  an  approximate  min-ciit  max-flow  theorem  for  a  type  of 
multicommodity  flow  problem.  This  theorem  yields  an  approximation  algorithm  for  finding 
a  separator  in  arbitrary  graphs  that  costs  at  most  a  O(log^n)  times  the  optimal.  They  also 
used  the  theorem  to  show  that  any  permutation  can  be  routed  on  an  arbitrary  network  so 
that  the  congestion  of  any  edge  and  path  length  of  any  message  is  within  a  O(logn)  factor 
of  optimal.  In  joint  work  with  Maggs,  they  explore  the  problem  of  scheduling  messages  on 
paths  with  given  congestion  and  length  so  that  the  routing  time  is  minimized. 

Jon  G.  Riecke 

Riecke  continues  to  work  in  the  area  of  semantics  and  logic  of  programming  languages,  with 
two  primary  interests:  the  semantics  of  continuations,  and  the  theory  of  “lazy”  (call-by¬ 
name)  functional  languages.  Working  jointly  with  Meyer,  he  investigated  some  seemingly 
known — but  undocumented — problems  in  the  theory  of  continuations.  More  specifically, 
Meyer  and  Riecke  showed  that  either  programming  with  continuations  explicitly  or  using 
special  “continuation-accessing”  operators  (e.g..  Scheme’s  call/cc)  leads  one  to  conclude 
different  facts  about  code;  old  equivalences  between  programs  may  no  longer  hold  in  a 
setting  with  continuations.  The  implications  of  these  results  and  their  precise  statements 
are  reported  in  [234]  and  in  Riecke’s  SM  thesis  [273]. 

The  theory  of  lazy  languages,  begun  by  Abramsky  and  Ong,  has  also  become  a  foe  is  of 
Rs’cke’s  work.  Lazy  functional  languages  pass  arguments  by  name  (that  is,  arguments  are 
not  evaluated  betore  passing),  but  nevertheless  stop  evaluating  h;gher-order  expressions — 
functions  -when  they  can  build  a.  closint.  The  usual  Scott-style  semantics  do  not  predict  this 
termination  behavior  correctly:  a  divergent  functional  and  a  closure  that  alw.  , diverges  have 
the  same  meaning.  Bloom  and  Riecke  j54]  developed  a  model  for  a  typed  lazy  language  that 
accurately  reflects  t  he  behavior  of  the  interpreter.  Cosmadakis  and  Riecke  (in  a  forthcoming 
paper)  used  the  model  to  develop  principles  for  reasoning  alrout  lazy  programs,  and  proved 
that  equalities  between  terms  in  a  fragment  of  the  language  are  decidable. 

In  the  pa.st  year,  Riecke  has  also  become  interested  in  intuitionistic  logic  and  type  theory, 
and  its  applications  to  the  t  heory  of  programming  languages.  He  will  continue  his  reading, 
as  well  as  pursuing  previous  lines  of  research. 

Phillip  Rogaway 

Rogaway  i.s  a  third  3'ear  graduate  student  working  under  Micali.  He  has  been  working  on 
cryptography  and  complexity  theory. 
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Rogaway’s  Master’s  thesis  evolved  into  the  CRYPTO-88  paper  which  easily  won  the  award 
for  most  coauthors  [41j.  This  paper  establishes  that  an  injective  one-way  function  sufiue:  to 
prove  all  of  IP  in  computational  zero-knowledge.  It  also  shows  that  the  “envelope  model” 
for  bit  commitment  suffices  to  s'liow  all  of  IP  has  perfect  zero-knowledge  proofs. 

Rogaway  investigated  generalized  notions  of  knowledge  complexity,  e.g.,  protocols  that  re¬ 
lease  a  "small”  (but  nonzero)  amount  of  information.  Recently  he  has  been  working  on 
reducing  the  interaction  required  for  secure  distributed  computation. 

John  Rompel 

In  January,  Rompel  conipieted  his  Master’s  thesis  [277]  based  on  approximation  algor-'y^  ’-s 
for  graph  coloring  developed  last  year  with  Berger  [dSj. 

More  recently.  Rompel  has  hef-n  working  on  problems  in  the  field  of  parallel  algOLitums. 
Rompel,  together  with  Berger,  iJG]  developed  a  general  framework  for  removing  randomness 
from  randomized  NC  algoritlim.s  whose  analysis  uses  only  polylogarithmic  independence. 
Previou.sly  no  techniques  were  known  to  determiuize  those  RNC  algorithms  depending  on 
more  than  constant  independence.  One  application  of  their  techniques  is  an  NC  algorithm 
for  the  set  discrepancy  problem,  which  can  be  used  to  obtain  many  other  NC  algorithms, 
including  a  better  NC  edge  coloring  algorithm.  As  another  application  of  their  techniques, 
they  provided  an  NC  algorithm  for  a  hypergraph  coloring  problem. 

Rompel,  working  with  Berger  and  .Shor  [47j,  gave  NC  approximation  algorithms  for  ffie 
unweighted  and  weighted  set  cover  problems.  Their  algorithms  use  a  linear  number  of  pro¬ 
cessors  and  give  a  cover  that  has  at  most  log n  times  the,  optimal  size/weight,  thus  matching 
the  y>crformauce  of  the  best  sequential  algorithms.  Previously,  there  were  no  knowm  parallel 
algorithms  for  <  he  general  set  cover  problem.  Berger,  Rompel  and  Shor  devised  a  randoim/.ed 
algorithm,  depending  on  only  [)airwise  independence,  and  then  converted  it  to  a  determin- 
i.stic  one.  Furthermore,  they  applied  their  set  cover  algorithm  to  learning  theory,  giving  an 
.NT’  algorithm  to  learn  the  concept  class  obtained  by  taking  the  closure  under  finite  union  or 
Unite  inter  'ction  of  any  corn ept  class  of  finite  VC-dimension  which  has  an  NC  hypothesis 
tinder.  In  addition,  they  gave  a  linear-processor  NC  algorithm  for  a  variant  of  the  se*  i  i  r 
problem  fust  proposed  by  Cha/eiie  and  Friedman,  and  used  it  to  obtain  NC  algorithms  for 
several  problems  in  roinputaf n.inal  geometry. 

.4rie  Rudich 

Kurlich  hegan  his  first  yuar  <i..-  ii  graduate  student  at  MIT  in  September  1988.  He  is  vvorkiug 
■  >n  an  .S.M  tiiesi.s  snpervise.i  :.y  .M''ver  cui  dataflow  theory  which  should  be  complete  !>} 
•Ian  nary  1990. 

His  research  aims  to  generali/'-  .cceni  results  by  Rabinovich  and  Trakhtenbrot  [269]  a;.  ’  ip, 
Lynch  and  Stark  f228‘,  which  piecisel}'  delimit  the  classes  of  dataflow  networks  for  .v'.icii 
Kahn’s  “Least  F'xed  Point  1  Muriple”  [17.3]  applies,  showing  that  Kahn’s  Funcipn  fa  I- 
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precisely  where  Brock- Ackerman-like  anomalies  [68j  begin.  Rabinovich  and  Trakhtenbrot 
established  this  boundary  without  distinguishing  completed  and  incompleted  output  streams. 
Rudich  aims  to  show  that  the  results  cany  over  to  the  more  conventional  model  where  the 
completed/incompleted  distinction  is  maintained. 

Robert  E.  Schapire 

Schapire  continued  to  work  with  Rivest  on  the  problem  of  inferring  an  unknown  finite-state 
automaton  from  its  input/output  behavior.  In  [274],  they  introduce  a  powerful  new  tech¬ 
nique,  based  on  the  inference  of  homing  sequences,  for  solving  this  problem  in  the  absence 
of  a  means  of  resetting  the  machine  to  a  start  state.  Their  inference  procedures  experiment 
with  the  unknown  machine,  and  from  lime  to  time  require  a  teacher  to  supply  counterexam¬ 
ples  to  incorrect  conjectures  about  the  structure  of  the  unknown  automaton.  In  this  setting, 
they  describe  a  learning  algorithm  that,  with  probability  1  —  <5,  outputs  a  correct  descrip¬ 
tion  of  the  unknown  machine  in  time  polynomial  in  the  automaton’s  size,  the  length  of  the 
longest  counterexampt"',  and  log(l/^).  They  present  an  analogous  algorithm  that  makes  use 
of  a  diversity-based  representation  of  the  finite-state  system.  Their  algorithms  are  the  first 
that  are  provably  effective  for  these  problems,  in  the  absence  of  a  “res?t.”  They  also  present 
probabilistic  algorithms  for  permutation  automata  which  do  not  require  a  teacher  to  supply 
counterexamples.  For  inferring  a  permutation  automaton  of  diversity  Z7,  they  improve  the 
best  previous  time  bound  by  roughly  a  factor  of  Z)^/log  D. 

In  January,  Schapire  led  a  team  participating  in  the  robot  building  contest  of  the  AI  Lab’s 
“Winter  Olympics.”  The  goal  of  their  project  was  to  build  a  robot  capable  of  performing 
some  simple  learning  task.  In  particular,  the  robot  they  built,  named  S’bot  (for  Smart- 
bot  or  Spotbot),  was  able  to  learn  from  experience  how  to  avoid  running  into  walls  and 
other  obstacles.  Their  team  consisted  of  Amsterdam,  Blum,  Goldman,  Moore,  Rivest,  and 
Schapire. 

Schapire  has  also  been  working  with  Goldman  and  Rivest  on  the  problem  of  inferring  a 
binary  relation  [130]  between  n  objects  of  one  kind  and  m  of  another.  This  can  be  viewed 
as  the  problem  of  inferring  an  n  x  m  binary  matrix.  Their  goal  has  been  to  minimize  the 
number  of  prediction  mistakes  made  by  a  learner  presented  with  such  a  matrix  one  entry  at 
a  time.  1  hey  have  been  able  to  prove  numerous  upper  and  lower  mistake  bounds  for  several 
variations  of  this  problem. 

FinaUy,  Schapire  has  been  looking  at  problems  relevant  to  the  distribution-free  (“pac”) 
learning  mode!  intiodured  by  V'^aliant  -304].  In  [281],  Schapire  considers  the  problem  of 
improving  the  accuracy  of  a  hypothesis  output  by  a  learning  algorithm.  He  shows  that 
a  model  oi  learnability,  called  weak  Icaniability,  in  which  the  learner  is  only  required  to 
perform  slightly  better  than  guessing,  is  as  strong  as  a  model  in  which  the  learner’s  error 
can  be  made  arbitrarily  small.  His  resid'  may  have  significant  applications  as  a  tool  for 
efficienily  converting  a  mediocre  learning  algorithm  into  one  that  performs  extremely  well. 


183 


Theory  of  Computation 


Leonard  Schulman 

Schulnian  spent  most  of  his  time  on  coursework  this  year.  In  the  spring  he  developed  an 
algorithm  for  sorting  n  elements  on  an  n-node  ring  of  processors  in  the  optimal  time  nf2. 
This  requires  only  constant  capacity  at  each  node  in  the  word  model.  Mansour  proved  a 
closely  related  lower  bound  and  these  two  results  have  been  combined  in  a  joint  paper  to  be 
submitted  shortly. 

During  the  summer  of  1989,  Schulman  intends  to  read  under  the  guidance  of  Sipser. 

Eric  J.  Schwabe 

Schwabe  has  been  working  on  problems  involving  the  efficient  implementation  of  dynamic 
structures  on  fixed-connection  networks.  In  particular,  he  worked  with  Leighton,  Newman, 
and  Ranade  (Berkeley)  on  the  problem  of  dynamically  embedding  binary  trees  in  butterfly 
and  hypercube  networks  i20l!.  Randomized  embedding  algorithms  were  found  for  both 
networks  which  simultaneou.sly  optimize  load  (the  maximum  number  of  tree  nodes  mapped 
to  a  processor)  and  dilation  (the  maximum  distance  in  the  network  between  adjacent  tree 
nodes)  for  trees  which  are  a  logarithmic  factor  larger  than  the  host  network.  An  improved 
algorithm  for  the  hypercube  was  found  which  optimizes  load  and  dilation  for  arbitrary  binary 
trees,  while  also  keeping  congestion  (the  number  of  times  a  hypercube  edge  is  ‘traced  over’  by 
an  embedded  tree  edge)  low.  Also,  lower  bounds  were  proved  which  show  that  deterministic 
algorithms  cannot  simultaneously  optimize  load  and  dilation. 

Schwabe  has  also  been  studying  the  relative  strengths  of  the  butterfly  and  shuffle-exchange 
graphs  as  interconnection  networks.  He  proved  that  normal  hypercube  algorithms  (those 
which  use  only  one  dimension  of  hypercube  edges  at  a  time,  and  adjacent  dimensions  in 
consecutive  time  steps)  can  be  simulated  on  a  butterfly  network  with  only  a  constant  shov- 
down,  a  result  which  was  pr^  vi(msly  known  only  for  the  shuffle-exchange  graph.  A  version 
of  this  result  is  being  prepared  for  journal  submission.  In  addition,  he  recently  discovered  a 
one  to-one  embedding  of  the  butterfly  into  the  shutfle-exchange  graph  with  constant  dilation 
and  congestion,  and  expansion  improving  a  result  of  Koch,  et,  al.  [184], 

Over  the  iK'xt  year,  Scliwabe  plans  to  work  on  relating  the  ideas  in  [201]  to  other  problems 
in  parallel  niemory  management,  and  to  continue  his  investigation  of  the  shuffle-exchange 
\'s,  t  he  but  t ('rfiy. 

Alan  Sherman 

Sherman  (now  fa<'ulty  at  T'lft  -  i  niversity)  has  completed  a  monograph  on  the  PI  Systeni  for 
placement  and  interconnect  oi  cu.-'tom  VT^.S’I  circuits  [2S.51.  The  PI  System  was  designed  and 
implemented  at  Mi  l'  under  ihe  leadership  of  Rivest;  Sherman  was  one  of  the  key  architcits. 
The  monograpii  is  being  puldi  iied  by  Springer- Vcilag.  Beginning  September  1989,  Sheimaa 
will  join  the  faculty  at  the  1  niviusity  of  Maryland,  Baltimore  County, 
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Robert  Sloan 

Sloan’s  primary  area  of  interest  tliis  past  year  was  computational  learning  theory.  His  major 
activity  for  the  year  was  preparing  his  doctoral  dissertation  [290].  Most  of  the  other  work  in 
computational  learning  theory  described  here  is  also  contained  in  that  work. 

Much  of  his  work  was  within  Valiant’s  model  of  probably  approximately  correct  learning 
[304].  VV’orking  with  Helmhold  and  Warmuth  while  visiting  the  University  of  California  at 
Santa  Cruz,  he  developed  an  algorithm  for  learning  certain  complex  combinations  of  concept 
classes  known  to  be  learnable  [1591. 

In  [275],  the  problem  of  learning  arbitrary  boolean  concepts  in  the  Valiant  model — by  break¬ 
ing  them  into  pieces  and  learning  one  piece  at  a  time  is  studied.  In  other  work,  Sloan  studied 
the  effects  of  different  sorts  of  noise  on  learning  in  the  Valiant  model  [288]. 

He  explored  an  alternate  model  of  inductive  inference  in  [276]. 

Sloan  also  remains  interested  in  the  subject  of  cryptography,  and  spent  some  time  studying 
different  definitions  of  zero-knowledge  [289]. 

Clifford  Stein 

Stein  has  been  working  with  Shmoys  on  developing  parallel  algorithms  for  combinatorial 
optimization  problems.  Together  with  Klein  of  Harvard,  he  developed  a  parallel  algorithm 
to  find  a  maximal  set  of  edge  disjoint  cycles  in  an  undirected  graph  in  O(logn)  time  using  ni 
processors  on  a  CRCW  PRAM.  A  maximal  set  of  edge  disjoint  cycles  is  a  set  of  cycles  whose 
removal  from  the  graph  renders  the  graph  acyclic.  Stein  and  Klein  have  also  been  able  to 
generalize  this  result  to  multi-graphs  and  obtain  an  algorithm  which  runs  in  O(logn  logC’) 
time,  where  C  is  the  largest  multiplicity  of  any  edge  (181). 

Using  this  algorithm,  Stein  has  developed  an  algorithm  which  finds  a  cycle  cocer  containing 
0(m  -I-  7ilogn)  edges  using  (7(log^  n)  time  on  ni  processors.  A  cycle  cover  is  a  set  of  cycles 
such  that  every  edge  in  the  graph  appears  in  at  least  one  cycle. 

Stein  has  observed  that  the  parallel  matching  algorithms  of  [242]  and  [174]  can  be  combined 
with  scaling  to  achieve  H.XC  algorithms  for  the  assignment  problem  which  use  a  number  of 
processors  independent  of  the  size  of  the  largest  number  in  the  problem,  by  slowing  down 
the  running  time  by  a  factor  proportional  to  the  logarithm  of  the  size  of  the  largest  number 
in  the  problem. 

Stein  has  also  been  rewriting  his  undergraduate  thesis  295'  for  publication.  Together  with 
Ahuja,  Orliri,  and  Tarjan.  h(“  developed  efficient  algorithms  for  a  wide  variety  of  network 
flov/  problems  in  bipartite  graphs.  The  main  results  are  of  the  following  form;  given  a 
bipartite  graph  with  n  nodes,  but  only  7!|  nodes  in  the  smaller  half  of  the  bipartition,  an 
algorithm  which  runs  in  time  (2{j(n.rn)]  can  be  converted  into  an  algorithm  which  runs  in 
time  0(/(ui.m)  •  n^rn).  This  approach  leads  to  an  algorithm  for  bipartite  maximum  flow 
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which  runs  in  O(nimlog(^  -r  2))  time,  an  algorithm  for  bipartite  minimum  cost  circulation 
which  runs  in  (nim  log  Ui  log(  n  i6'))  time,  and  an  algorithm  for  parametric  maximum  how 
which  solves  /  bipartite  maximum  flow  problems  in  0(ln  +  nimlog(^"^'^"^ — f-  2))  time. 

Margaret  C.  Tuttle 

Tuttle  joined  the  Theory  Group  this  year  and  has  been  working  with  Shmoys  on  approxima¬ 
tion  algorithms  for  the  Mixed  Postman  Problem:  given  a  weighted  graph  G,  find  a  least-cost 
tour  of  G  which  traverses  each  edge  at  least  once.  When  G  is  totally  directed  or  totally 
undirected,  the  problem  can  be  solved  in  polyrmniiai  tune.  Vvhen  G  is  a  mixed  graph  i  i  e., 
some  edges  are  directed  and  some  are  undirected),  the  problem  is  NP-complete  (as  shown 
by  Papadimitriou  in  1976). 

This  summer  she  will  continue  working  with  Shmoys. 

Joel  Wein 

Wein  has  been  working  with  Shmoys  on  parallel  graph  algorithms.  He  recently  extended 
a  result  of  Karloff’s  to  obtain  a  Las  Vegas  RNC  algorithm  for  minimum  weight  perfect 
matching,  where  the  weights  are  represented  in  unary.  This  problem  was  shown  to  be  in 
use  by  Karp,  Upfal,  and  Wigderson,  but  the  algorithm  was  Monte  Carlo  in  nature:  it 
yielded  a  correct  solution  with  high  probability,  but  was  unable  to  determine  if  the  solution 
was  indeed  optimal.  Wein  developed  a  way  to  carry  out  this  certification  in  RNC,  yielding 
a  robust  Las  Vegas  algorithm  that  can  verify  optimality.  The  result  utilizes  a  structure 
theorem  of  Sebb  for  the  t-join  problem  and  yields  an  RNC  Las  Vegas  algorithm  for  that 
priddem  as  well. 

Over  the  summer,  Wein  worked  at  Thinking  Machines  Corporation,  developing  practical 
Connection  .M  achine  implementations  for  various  optimization  problems.  He  intends  to 
continue  working  on  both  practical  and  theoretical  aspects  of  parallel  computation. 


Su-Ming  Wu 

Working  with  lardos,  Wu  hH>  developed  an  0{n~}  algorithm  for  the  problem  of  finding  two 
edge-fiisjoint  patlm  in  a  grajih  ,112  .  The  basis  for  the  algorithm  is  a  graph-theoretic  proof 
of  Se\ mour  !  ilell  Cummuniv  at  lolls  Hesearch  Laboratory). 
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14.1  Introduction 


The  Theory  of  Distributed  Systems  Group  has  continued  its  work  on  algorithms  and  impos¬ 
sibility  results  for  distributed  problems,  as  well  as  its  work  on  modeling,  proof  techniques, 
and  applications.  Particular  highlights  this  year  include  our  work  on  atomic  registers,  on 
real  time  systems,  and  on  the  design  of  a  system  for  simulating  distributed  algorithms. 

14.2  Faculty  Reports 

Nancy  A.  Lynch 

This  year,  Nancy  Lynch  worked  on  combinatorial  results  (and  modeling)  for  asynchronous 
communication  protocols.  The  paper  [222]  shows  the  impossibility  of  implementing  reliable 
data  link  behavior  in  the  face  of  certain  assumptions  about  physical  channels  and  about 
node  failures.  Besides  the  combinatorial  results,  another  contribution  of  this  paper  is  the 
style  of  problem  specification,  done  in  terms  of  I/O  behavior  of  physical  channels  and  data 
links.  She  has  continued  this  work  by  proving  a  related  impossibility  result  for  “oblivious” 
non-FIFO  physical  channels,  and  by  simplifying  the  specifications  used  in  [222].  Both  of 
these  efforts  are  still  in  progress. 

Other  combinatorial  work  this  year  includes  some  new  complexity  results  for  real  time  com¬ 
puting  (see  below).  Lynch  completed  revisions  of  two  older  papers  [105]  [69].  Also,  she 
worked  on  the  problem  of  processor  renaming  in  an  asynchronous  systems,  mostly  unsuc¬ 
cessfully. 

Lynch  also  worked  on  general  models  for  concurrent  systems.  With  Mark  Tuttle,  she  wrote 
a  short  paper  [226]  introducing  the  I/O  automaton  model;  a  longer  journal  version  is  still 
planned.  This  model  is  used  to  redo  a  pair  of  prior  results,  done  originally  in  other  models 
il08j[17];  the  revisions  appear  to  be  somewhat  simpler  than  the  originals.  Lynch  also  super¬ 
vised  Magda  Nour’s  SB  thesis  project,  in  which  Magda  established  interesting  connections 
between  the  Unity  model  of  Chandy  and  Misra  and  the  I/O  automaton  model.  Other  work 
on  modeling  includes  work  on  modeling  real  time  systems  (see  below). 

In  addition,  Lynch  worked  on  u-sing  I/O  automata  to  verify  correctness  of  complicated  con¬ 
current  algorithms.  This  verification  work  includes  the  correctness  proof  [308]  of  the  Gallager, 
et  al.  Minimum  Spanning  Tree  algorithm  [122],  and  the  paper  [309]  on  Drinking  Philoso¬ 
phers.  Lynch  supervised  some  revisions  of  Russel  Schaffer’s  SB  thesis  on  verifying  atomic 
register  protocols  [280].  She  also  supervised  Chris  Colby’s  SB  thesis  on  verifying  the  cor¬ 
rectness  of  the  Peterson-Fischer  tournament-structured  mutual  exclusion  algorithm.  Finally, 
she  used  I/O  automata  in  her  consulting  work  at  Apollo  Computer,  to  verify  the  correct¬ 
ness  of  a  complex  algorithm  for  managing  highly  available  replicated  data.  This  proof  has 
an  interesting  structure,  based  on  multivalued  abstraction  mappings:  first,  a  non-garbage- 
collected  version  of  the  algorithm  is  proved  correct,  and  then  the  “real”  algorithm,  using 
garliage-collection  of  old  updates,  is  proved  correct  using  abstraction  mappings  relating  it 
to  the  non-garbage-collected  version. 
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Many  of  these  proofs  are  done  in  a  style  and  at  a  level  of  detail  that  make  them  suitable 
candidates  for  machine  verification;  with  John  Guttag  and  Steve  Garland,  Lynch  is  exploring 
the  possibility  of  using  LP  to  perform  such  verification. 

Work  also  continued  on  the  theory  of  atomic  transactions,  although  at  a  slo  vn  pace  than 
last  year.  This  work  is  a  series  of  papers  on  modeling  and  verifying  different  kinds  of 
transaction-processing  algorithms,  culminating  in  a  book  [224]  to  tie  them  all  together.  This 
year,  she  carried  out  substantial  revisions  of  [104]  and  [160]  (not  yet  complete).  Our  papers 
on  locking  algorithms  [104]  and  timestamp  algorithms  [18]  appeared  in  conferences.  Besides 
the  revisions,  our  current  work  in  progress  includes  results  relating  our  theorems  to  those 
of  the  “classical  theory”  of  database  concurrency  control,  and  results  about  algorithms  that 
carry  out  concurrency  control  simultaneously  at  several  different  levels  of  data  abstraction. 

Lynch  began  nev/  work  on  a  theory  for  real  time  systems  (and  more  generally,  for  timing- 
based  systems)  as  part  of  the  ONH’s  new  initiative  on  real  time  computing.  She  gave  a 
talk  at  the  ONR  Workshop  on  Real  Time  Systems  in  November,  on  “Modeling  Real  Time 
Systems.”  This  introductory  talk  showed  how  I/O  automata,  extended  to  include  time  as  in 
[238],  could  be  used  to  model  the  timing  restrictions  and  requirements  of  real  time  computing 
systems.  Working  with  Hagit  Attiya,  she  continued  to  pursue  some  of  the  ideas  in  the  talk,  in 
particular,  to  study  upper  and  lower  bounds  for  combinatorial  problems  in  real  time  systems. 
For  example,  in  [23],  they  proved  upper  and  lower  bounds  on  both  centralized  and  distributed 
versions  of  a  timing-based  variant  of  the  mutual  exclusion  problem  (which  appeared  in  the 
real  time  literature  as  the  “nuclear  reactor  problem”).  Stephen  Ponzio  carried  out  related 
work  (described  below)  on  the  Dining  Philosophers  problem. 

Our  correctness  proofs  for  the  algorithms  in  [23]  turt^ed  out  to  have  a  very  interesting  style, 
adapting  standard  proof  techniques  for  proving  safety  properties  (such  as  invariant  assertions 
and  abstraction  mappings)  for  use  in  timing-based  systems.  We  are  currently  working  on 
developing  these  proof  methods  for  timing-based  algorithms  in  appropriate  generality. 

Lynch  gave  the  Keynote  Address  at  the  1988  Symposium  on  Principles  of  Distributed  Com¬ 
puting.  The  talk  she  gave  was  a  survey  of  the  many  impossibility  results  that  have  been 
proved  in  this  research  field.  Preparing  for  this  talk  was  itself  a  major  project  of  hunting 
down  the  results  and  classifying  them.  She  has  written  a  paper  based  on  this  talk  [219],  to 
appear  in  this  year’s  PODC  proceedings. 

Also,  Lynch  put  the  final  touches  on  the  paper  [194],  which  is  a  survey  of  the  theory  of 
distributed  computing  research  area,  and  has  recently  written  a  new  NSF  proposal  with 
Baruch  Awerbuch. 

Research  service  activity  this  year  included  serving  as  editor  of  a  special  issue  of  IEEE 
Transactions  on  Computer  Systems  on  parallel  and  distributed  algorithms,  and  also  as  an 
editor  for  Information  and  Computation.  Lynch  also  served  on  the  selection  committee 
for  this  year’s  ACM  Thesis  Prize,  and  on  the  Program  Committee  for  the  annual  .4C'M 
Symposium  on  Theory  of  Computing. 
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Besides  supervising  her  own  research  students,  Lynch  served  as  reader  on  thesis  committees 
for  Radia  Perlman  at  MIT,  and  for  Lisa  Higham  at  the  University  of  British  Columbia. 

With  Ken  Goldman,  Lynch  put  together  a  set  of  course  notes  for  her  class  on  Distributed 
Algorithms  [221]. 

Plans  for  the  near  future  are  to  continue  her  work  on  combinatorial  results,  especially  those 
involving  timing-based  computation,  consensus  and  atomic  objects.  She  also  plans  to  finish 
her  book  on  atomic  transactions  and  to  continue  her  work  on  modeling  concurrent  systems, 
including  those  that  use  timing  assumptions  and  randomization.  She  will  also  continue  to  try 
to  use  the  I/O  automaton  model  to  describe  the  semantics  of  other  frameworks  and  languages 
for  concurrent  computation,  and  to  prove  correctness  of  difficult  concurrent  algorithms. 

14.3  Research  Associate  and  Student  Reports 

Hagit  Attiya 

Research  Associate  Hagit  Attiya  worked  with  Michael  Fischer,  Da- Wei  Wang,  and  Lenore 
Zuck,  of  Yale  University,  on  the  Sequence  Transmission  Problem  [22].  Here,  a  processor 
should  transfer  a  sequence  of  data  items  to  another  processor  over  an  asynchronous  channel. 
It  was  shown  that  there  is  a  protocol  using  finite-sized  messages  for  this  problem  over  an 
asynchronous  channel  that  may  reorder  and  delete  messages,  using  finite-sized  messages. 
This  is  in  contrast  with  the  results  in  [222],  where  it  was  shown  that  there  is  no  bounded 
protocol  for  solving  the  related  data  link  layer  problem. 

The  rest  of  the  research  done  by  Attiya  during  this  period  can  be  divided  into  two  areas; 
timing-based  problems,  and  wait- free  coordination. 


Timing-based  Problems 

This  work,  joint  with  Nancy  Lynch,  uses  the  timed  I/O  automata  framework  (introduced  by 
[238],  SCO  also  [218]). 

They  considered  a  timing-based  variant  of  the  mutual  exclusion  problem.  In  this  variant, 
only  an  upper- bound  on  the  time  it  takes  to  release  the  resource  is  known,  as  opposed  to 
receiving  an  explicit  signal  when  the  resource  is  released.  Furthermore,  the  only  mechanism 
to  measure  real  time  is  an  inaccurate  clock,  whose  tick  intervals  take  time  between  two 
known  constant  bounds.  Upper  and  lower  bounds  on  the  response  time  of  any  algorithm 
solving  this  problem  were  proven,  for  algorithms  where  the  control  is  either  centralized  or 
(li.stributed . 

1  he  lower  bound  proofs  make  use  ot  new  techniques.  In  order  to  prove  the  correctness  of 
these  algorithms.  Lynch  and  Attiya  developed  a  way  to  transform  any  timed  automaton 
into  a  “regular"  automalon,  by  building  timing  information  into  the  automaton  state,  thus 
enabling  the  i;,se  of  classic  invariant  assertion  proof  technique.  Surprisingly,  this  method  can 
be  extended  to  prove  the  ju  rfonnancr  of  algorithms,  d'his  direction  is  explored  in  [220  . 


204 


Theory  of  Distributed  Systems 


Wait-free  Coordination 

Hagit  worked  with  Danny  Doiev  (of  IBM  Almaden  and  Hebrew  University)  and  Nir  Shavit 
on  wait-free  solutions  problems  using  bounded  shared  memory.  The  goal  of  this  research 
is  the  development  of  tools  that  will  eventually  enable  us  to  reveal  the  exact  relationship 
between  boundedness  of  memory  and  wait-freeness  of  operations.  Two  specific  problems 
considered  are; 


1.  Randomized  Consensus:  All  the  algorithms  known  for  this  problem,  and  in  particu¬ 
lar  the  only  polynomial  algorithm  ([19]),  use  shared  memory  with  unbounded  values. 
Attiya,  Doiev,  and  Shavit  found  a  bounded  polynomial  solution  to  the  problem  [21], 
answering  the  open  question  of  Abrahamson  [2]. 

2.  Snapshot  Scan:  A  snapshot  scan  returns  an  “instantaneous”  picture  of  memory.  Such 
an  algorithm  will  greatly  simplify  proofs  of  concurrent  programs,  and  is  an  important 
building  block  in  many  algorithms  [95][2][19|.  The  correctness  of  a  bounded  construc¬ 
tion  that  solves  this  problem  is  currently  being  proven  (joint  work  with  Michael  Merritt 
and  Yehuda  Afek  of  AT&T  Bell  Labs). 


In  joint  work  with  Mark  Tuttle  [24],  new  proof  techniques  were  developed  to  prove  lower 
bounds  for  problems  in  both  the  shared  memory  and  message  passing  models  of  computa¬ 
tion.  These  techniques  are  nontrivial  generalizations  of  [108][217][67],  and,  as  a  result  of  the 
similarity  of  the  proofs  in  the  two  models,  have  the  advantage  of  exposing  similarities  be¬ 
tween  the  shared  memory  and  message  passing  models.  Using  these  techniques,  it  is  possible 
to  obtain  a  new  tight  lower  bound  for  the  slotted  f-exclusion  problem  [20]  (a  similar  lower 
bound  was  proved  for  the  related  problem  of  f-assignment  in  [70]).  Furthermore,  these  tech¬ 
niques  give  simple  proofs  for  known  results,  such  as  consensus  [108]  and  processor  renaming 
[20], 

Chris  Colby 

Chris  Colby  worked  on  two  projects. 

Ken  Goldman  is  developing  an  I/O  automaton  distributed  simulation  system.  It  will  be  used 
to  aid  in  the  design  and  study  of  distributed  algorithms  using  the  I/O  automaton  model. 
During  the  summer  of  1988.  Colby  worked  on  the  development  of  a  graphical  user-interface 
for  the  system.  The  interU.'e  allows  the  user  to  graphically  configure  I/O  automaton  systems 
by  composing  automata  Hierarchically  and  specifying  topology  information.  The  interface 
will  be  used  to  obser  ve  automaton  states  during  simulation,  and  may  eventually  be  used  to 
guide  the  particular  execution  path  taken  by  the  simulator. 

Colby  also  finished  his  undergraduate  thesis  entitled  Correctness  Proofs  of  the  Peterson- 
Fischer  Mutual  Exclusion  Algorithms.  In  this  thesis,  The  Peterson-Fischer  2-process  mutual 
exclusion  algorithm  ^260]  is  introduced  in  a  slightly  modified  form.  An  invariant-assertional 
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proof  of  mutual  exclusion  is  presented  for  the  2-process  algorithm.  Next,  the  Peterson- 
Fischer  n-process  mutual  exclusion  algorithm  is  introduced  conceptually  as  a  tournament 
of  fig  nl  2-process  competitions.  A  mutual-exclusion  proof  of  the  n-process  algorithm  is 
presented,  based  on  a  mapping  between  states  of  the  n-process  system  and  states  of  the  2- 
process  system.  This  mapping  delineates  the  correspondence  between  the  2-process  code  and 
one  iteration  (competition)  of  the  7i-process  code.  In  this  way,  the  statement  of  correctness 
of  the  2-process  algorithm  is  used  as  a  lemma  for  the  7!-process  proof. 

Alan  Fekete 

Alan  Fekete  was  a  member  of  the  TDS  Group  until  September  30,  1988.  He  w'orked  on  the 
theory  of  concurrency  control  for  nested  transactions.  In  particular,  he  developed  with  Nancy 
Lynch  a  way  to  use  the  general  theory  given  in  the  paper  [223]  to  prove  a  sufficient  condition 
for  correctness  that  resembles  the  “absence  of  cycles”  condition  used  in  the  conventional 
theory  of  serializability  for  transactions  without  nesting.  With  this  condition,  they  gave  a 
simjile  direct  proof  of  the  correctness  of  Moss’  algorithm  for  read/update  locking.  He  also 
found  a  way  to  model  and  verify  some  “optimistic”  timestamp-based  concurrency  control 
algorithms,  that  allow  some  transactions  to  proceed  in  the  hope  that  no  errors  will  occur, 
and  only  check  that  in  fact  nothing  went  wrong  before  commit  occurs  (rather  than  before 
each  access  ^o  objects).  .Another  area  of  Fekete’s  work  was  the  possibility  and  impossibility 
of  solving  certain  problems  concerned  with  building  a  reliable  message  service  on  top  of  an 
unreliable  service.  Results  proved  by  Lynch,  Mansour,  and  Fekete  [222]  were  compared  with 
these  of  Attiya,  Fischer,  W’ang,  and  Zuck  which  used  a  significantly  different  model  of  a 
system,  in  order  to  see  in  which  respects  the  results  in  one  model  carried  over  into  the  other. 
.4  result  prove  with  Lynch  shows  that  some  sequence  information  was  essential,  even  under 
very  weak  constraints  on  the  system. 

Ken  Goldman 

Ken  Goldman  has  been  working  on  his  Ph.D.  thesis.  Simulation  of  Concurrent  Algorithms 
(  suit]  I/l)  Automata.  He  has  been  designing  a  strongly  typed  language  based  on  the  I/O 
automaton  urode!  and  a  simulation  system  for  studying  algorithms  expressed  in  that  lan¬ 
guage.  'Fhe  system  is  intended  as  a  research  tool  to  aid  in  the  design  and  understanding 
ot  distributer!  fdgorithins.  C’hri.s  Colby  has  been  implementing  a  graphical  interface  for  the 
system  (see  above).  In  1126),  Goldman  explored  the  possibility  of  distributing  the  simulation 
in  a  highly  concurrent  manner,  wh.ile  fully  preserving  the  semantics  of  the  model. 

As  part  of  his  area  examinatii ui ,  Goldman  studied  the  languages  Lisp,  Connection  Machine 
lisp,  and  i’aralation  Lisp  in  terms  of  their  utility  for  writing  efficient  scientific  programs 
oil  the  ('oiinection  Machine.  In  '127;,  he  reviews  those  languages  and  proposes  a  set  of 
e-Ktensions  io  Paralation  lhs[i  tor  improving  both  the  expressiveness  and  the  efficiency  of 
that  language. 
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John  Leo 

John  Leo  has  been  continuing  work  on  his  Master’s  thesis  I/O  Automata  Techniques  and 
Examples,  expected  to  be  completed  by  August  1989.  The  thesis  is  based  on  two  examples, 
SIFT  and  OLYMPIC  TORCH,  both  involving  process  creation.  The  emphasis  of  the  thesis 
is  to  deir  jnstrate  proof  techniques  and  create  new  tools  for  correctness  proofs  using  I/O 
automata.  One  such  tool  is  a  version  of  mappings  from  components  of  a  composition  to  a 
single  higher  level  specification,  adapted  from  [227j.  Other  tools  will  be  theorems  concerning 
message  passing  systems  and  local  improvements. 

Magda  Nour 

Magda  Nour  worked  on  and  finished  her  undergraduate  thesis  entitled  An  A.uiomata-Theoretic 
Model  for  UNITY. 

UNITY — Unbounded  Non  deterministic  Iterative  Transformation — is  a  computational  model 
and  a  proof  system  to  aid  in  the  design  of  parallel  programs  developed  by  K.  Mani  Chandy 
and  Jayadev  Misra  at  the  University  of  Texas. 

The  Input /Output  Automaton  model  is  a  computational  model  developed  by  Nancy  Lynch 
and  Mark  Tuttle  that  may  be  used  to  model  concurrent  and  distributed  systems. 

This  thesis  connects  these  two  theories.  Specifically,  it: 


1.  defines  Unity  Automata,  a  subset  of  I/O  automata  based  on  the  UNITY  computational 
model,  TiNj'fY  program; 

2.  defines  a  mapping  from  UNITY  programs  to  LENITY  automata; 

3.  adapts  the  UNITY  proof  concepts  to  the  I/O  automaton  computational  model  in  order 
to  obtain  UNITY  style  proof  rules  for  ./O  automata; 

4.  adapts  UNITY  composition  operators  to  the  I/O  automaton  model  and  obtains  com¬ 
position  proof  rules  for  them;  and 

b.  consi  lers  various  examples  illustrating  the  above  work. 


In  addition,  this  work  introduces  an  augmentation  to  the  I/O  automaton  model  which  facil¬ 
itates  reasoning  about  randomized  algorithms,  adapts  UNITY  concepts  to  it,  and  presents 
an  example  of  a  UNITY  style  high  probability  proof  using  such  a  model. 
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Stephen  Ponzio 

Stephen  Ponzio  is  currently  studying  the  complexity  of  solutions  to  the  “dining  philosophers 
problem”  in  a  real  time  system.  If  the  maximum  amount  of  real  time  that  any  processor 
spends  in  the  critical  region  is  bounded  by  some  constant  c,  then  a  simple  alternating  algo¬ 
rithm  guarantees  a  waiting  time  of  3c  4-  0(1).  He  shows  that  in  a  system  of  n  processors 
which  issue  requests  asynchronously,  no  algorithm  can  guarantee  a  waiting  time  of  less  than 
2c.  He  also  gives  an  algorithm  that  guarantees  2c  +  0(n)  for  n  even.  Future  research  will 
improve  upon  the  known  algorithms  or  raise  the  lower  bound  by  including  terms  such  as 
message  delay  time.  Other  problems  fundamental  to  distributed  computing  will  also  be 
considered. 

Nir  Shavit 

Nir  Shavit  worked  with  Danny  Dolev  of  IBM  ARC  on  the  Bounded-Concurrent-Time- 
Stamping  problem.  Concurrent  time  stamping  is  at  the  heart  of  solutions  to  some  of  the  most 
fundamental  problems  in  distributed  computing.  Based  on  concurrent-time-stamp-systems, 
elegant  and  simple  solutions  to  core  problems  such  as  fcfs-mutual-exclusion,  construction 
of  a  multi-reader-multi-writer  atomic  register,  probabilistic  consensus,  etc.  were  developed. 
Unfortunately,  the  only  known  implementation  of  a  concurrent-time-stamp-system  has  been 
theoretically  unsatisfying,  since  it  requires  unbounded  size  time-stamps;  in  other  words, 
unbounded  memory.  Not  knowing  if  bounded  concurrent-time-stamp-systems  are  at  all  con- 
structible,  researchers  were  led  to  constructing  complicated  problem-specific  solutions  to 
replace  the  simple  unbounded  ones.  In  this  work,  for  the  first  time,  a  bounded  implementa¬ 
tion  of  a  concurrent-time-stamp-system  is  presented.  It  provides  a  modular  unbounded-to- 
bounded  transformation  of  the  simple  unbounded  solutions  to  problems  such  as  above.  It 
allows  solutions  to  two  formerly  open  problems,  the  bounded-probabilistic-consensus  prob¬ 
lem  of  Abraliamson  [2],  and  the  FIFO-l-excIusion  problem  of  [110],  and  a  more  efficient 
construction  of  rnrinw-atomic  registers.  This  work  [95]  was  presented  at  STOC  1989. 

Shavit  also  worked  with  Hagit  Attiya  and  Danny  Dolev  on  wait-free  solutions  of  problems 
using  bounded  shared  memory.  The  goal  of  this  research  is  the  development  of  tools  that 
will  eventually  enable  us  to  reveal  the  exact  relationship  between  unboundedness  of  memory 
and  wait-freeness  of  operations.  Two  specific  problems  we  considered: 


1.  Randoinizfd  Consensus:  .Ml  the  algorithms  known  for  this  problem,  and  in  particular 
the  only  polynomial  ;dgorit}iin  (due  to  Aspens  and  Herlihy),  use  shared  memory  with 
unbounded  values.  A  bounded  polynomial  solution  to  the  problem  [21],  answering  the 
open  (juestion  of  Abrahaiuson  [2],  will  be  presented  at  POCD  1989. 

2.  Snapshot  Scan:  VVe  are  interested  in  a  scan  algorithm  that  will  return  an  “instanta¬ 
neous  ’  picture  of  memory.  Svich  an  algorithm  will  greatly  simplify  proofs  of  concurrent 
programs,  and  is  an  important  building  block  in  many  algorithms  [95][2][19].  We  are 
currently  in  the  proi  ess  (d  proving  the  correctness  of  a  bounded  construction  that  we 
believe  solves  this  problem  (  joint  work  with  Mike  Merritt  and  Yehuda  Afek)  . 
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Shavit  also  worked  with  Baruch  Awerbuch  and  Yishay  Mansour  on  the  end-to-end  problem 
in  unreliable  networks.  This  is  a  fundamental  problem,  dealing  with  the  question  of  how  to 
assure  that  two  nodes  in  a  coi^imunication  network  such  as  Bitnet,  can  guarantee  communi¬ 
cation  even  if  they  are  only  eventually  connected  {see  [315]  ).  We  present  the  first  polynomial 
solution  to  the  problem  [31j,  opening  the  possibility  that  with  further  research,  a  practically 
efficient  solution  to  the  problem  may  be  found. 

Shavit  is  currently  working  with  Hagit  Attiya  and  Nancy  Lynch  on  a  line  of  research  that 
will  lead  to  a  better  understanding  of  the  notion  of  wait-freeness  and  its  relation  to  fault 
tolerance.  We  are  currently  attempting  to  verify  if  existing  definitions  really  capture  the 
intuitive  properties  attributed  to  wait-free  primitives. 

Shavit  is  in  the  process  of  completing  a  draft  of  joint  work  with  Mike  Merritt  and  Yehuda 
Afek  on  the  local  snapshot  algorithm,  an  algorithm  that  performs  snapshots  in  distributed 
message  passing  systems,  with  time  and  communication  complexity  dependent  on  the  flow 
of  computation  of  the  application,  rather  than  the  size  of  the  complete  network. 

Also,  Shavit  is  writing  his  Ph.D.  thesis,  touching  on  many  of  the  research  topics  mentioned 
above. 

Greg  Troxel 

Greg  Troxel  refined  an  algorithm  he  developed  for  detecting  and  recovering  from  process- 
execution  resource  deadlock  in  a  system  using  remote  procedure  calls  intended  for  the  Fault 
Tolerant  Parallel  Processor  being  developed  at  the  Charles  Stark  Draper  Laboratory.  He  is 
constructing  a  proof  using  I/O  automata  that  this  algorithm  is  correct.  Interesting  issues  in 
the  proof  are  the  formal  specification  of  correctness  conditions  of  the  algorithm,  since  this 
algorithm  functions  as  a  scheduler  for  the  remote  procedure  call  system,  and  demonstration 
of  liveness  properties. 

Mark  Tuttle 

Mark  Tuttle  is  primarily  interested  in  understanding  the  correctness  and  construction  of 
distributed  algorithms  in  terms  of  the  “knowledge”  individual  processors  in  a  distributed 
system  have  about  their  environment  (e.g.,  the  local  states  of  neighboring  processors,  etc.). 

His  current  interest  is  understanding  cryptographic  protocols  and  system  security  in  terms 
of  formal  notions  of  knowledge  (e.g.,  [H.5]).  In  order  to  think  about  probabilistic  protocols 
like  these  in  teiins  of  knowledge,  we  have  to  be  able  to  answer  the  question  “What  should  it 
mean  for  an  agent  to  know  or  believe  an  assertion  is  true  with  probability  .99?”.  Different 
papers  [102][109][145]  give  difl’erent  answers,  choosing  to  use  quite  different  probability  spaces 
when  computing  the  probability  an  agent  assigns  to  an  event.  In  [146],  joint  work  with  Joe 
Halpern,  they  show  no  single  choice  is  correct  in  all  contexts,  and  show  for  any  given  context 
how  to  make  the  most  appropriate  choice.  They  show  that  each  choice  can  be  understood 
in  terms  of  a  betting  game,  and  that  each  choice  corresponds  to  betting  against  a  different 
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opponent.  They  consider  three  types  of  adversaries.  The  first  selects  the  outcome  of  all 
nondeterministic  choices  in  the  system;  the  second  represents  the  knowledge  of  the  agent’s 
opponent  (this  is  the  key  place  the  above-mentioned  papers  differ);  the  third  is  needed  in 
asynchronous  systems  to  choose  the  time  the  bet  is  placed.  They  illustrate  the  need  for 
considering  all  three  types  of  adversaries  with  a  number  of  examples.  Given  a  class  of 
adversaries,  they  show  how  to  assign  probability  spaces  to  agents  in  a  way  most  appropriate 
for  that  class,  where  “most  appropriate”  is  made  precise  in  terms  of  this  betting  game. 
They  conclude  by  showing  how  different  assignments  of  probability  spaces  (corresponding  to 
different  opponents)  yield  different  levels  of  guarantees  in  coordinated  attack. 

In  [241],  it  is  shown  how  to  construct  extremely  fast,  efficient  protocols  for  problems  like 
consensus  in  synchronous  systems,  problems  requiring  processors  to  perform  coordinated 
actions  simultaneously.  It  is  shown  that  the  construction  of  such  protocols  reduces  to  testing 
lor  a  state  of  knowledge  called  common  knowledge.  Unfortunately,  these  results  do  not 
extend  to  asynchronous  systems;  in  fact,  it  is  known  the  state  of  common  knowledge  cannot 
be  attained  in  such  systems  [144].  In  such  systems,  however,  the  state  of  eventual  common 
knowledge  [144]  appears  to  be  very  closely  related  to  the  solution  of  such  problems,  but  there 
are  no  useful  tools  for  proving  that  this  state  of  knowledge  is  or  is  not  attained,  let  alone  for 
a  processor  to  test  for  this  state  of  knowledge.  In  [302],  Tuttle  gives  a  new  game-theoretic 
characterization  of  eventual  common  knowledge,  a  characterization  that  may  be  a  first  step 
in  developing  such  tools. 

In  joint  work  with  Hagit  Attiya  [24],  new  proof  techniques  are  developed  to  prove  lower 
bounds  for  problems  in  both  the  shared  memory  and  message  passing  models  of  computa¬ 
tion.  These  techniques  are  nontrivial  generalizations  of  [108][217][67],  and,  as  a  result  of 
the  similarity  of  the  proofs  in  the  two  models,  have  the  advantage  of  exposing  similarities 
between  the  shared  memory  and  message  passing  models.  Using  these  techniques  it  is  pos¬ 
sible  to  obtain  easily  known  lower  bounds  on  consensus  [108],  processor  renaming  [20],  and 
f-exclusion  [70]. 

Other  work  by  Tuttle  this  year  includes  further  exposition  [226]  with  Nancy  Lynch  of  the 
Input/Output  ,4utomaton  model  of  distributed  computation  formalized  in  the  course  of  his 
Master’s  thesis  work  [225],  and  further  thought  [238]  on  the  problem  of  adding  time  to  the 
I/O  automaton  model  to  allow  the  model  to  be  used  to  reason  about  real  time  systems. 
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Computer  Science,  June  1988. 


Theses  in  Progress 

[1]  K.  Goldman.  Concurrent  Algorithm  Simulation  Using  Input/Output  Automata.  PhD 
thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  1989.  Super¬ 
vised  by  N.  Lynch. 

[2]  J.  Leo.  I/O  Automata  Techniques  and  Examples.  Master’s  thesis,  MIT  Department  of 
Electrical  Engineering  and  Computer  Science,  1989.  Supervised  by  N.  Lynch. 

3]  S.  Ponzio.  A  Real  Time  Analysis  of  Some  Problems  in  Distributed  Computing.  Mas¬ 
ter’s  thesis,  MIT  Department  of  Electrical  Engineering  and  Computer  Science,  1989. 
Supervised  by  N.  Lynch. 

[4]  N.  Shavit.  Concurrency  in  Communication.  PhD  thesis,  Hebrew  University,  Jerusalem. 
Supervised  by  D.  Dolev. 
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[5]  G.  Troxel.  A  Hierarchical  Proof  of  an  Algorithm  for  Deadlock  Recovery  in  a  System  us¬ 
ing  Remote  Procedure  Calls.  Master’s  thesis,  MIT  Department  of  Electrical  Engineering 
and  Computer  Science,  1989.  Supervised  by  N.  Lynch. 

[6]  M.  Tuttle.  Knowledge  and  Distributed  Computation.  PhD  thesis,  MIT  Department  of 
Electrical  Engineering  and  Computer  Science,  1989.  Supervised  N.  Lynch. 


Theses  Completed 

[1]  C.  Colby.  Correctness  proofs  of  the  Peterson-Fischer  mutual  exclusion  algorithms. 
Bachelor’s  thesis.  MIT  Department  of  Electrical  Engineering  and  Computer  Science, 
June  1989.  Supervised  by  N.  Lynch. 

[2]  M.  Nour.  An  Automata- theoretic  Model  for  Unity.  Bachelor’s  thesis.  MIT  Department 
of  Electrical  Engineering  and  Computer  Science,  June  1989.  Supervised  by  N.  Lynch. 


Lectures 


H.  Attiya.  Reliable  communication  over  unreliable  channels.  Lecture  given  at  IBM  T. 
J.  Watson  Research  Center,  December  1988. 


H.  Attiya.  Time  bounds  for  real  time  process  control  in  the  presence  of  timing  uncer¬ 
tainty.  Lecture  given  at  IBM  Almaden  Research  Center,  April  1989. 


N.  Lynch.  A  hundred  impossibility  proofs  for  distributed  computing.  Keynote  Ad¬ 
dress  given  at  the  Seventh  Annual  Symposium  on  Principles  of  Distributed  Computing. 
Toronto,  Canada,  August  1988. 


N.  Lynch.  A  theory  of  atomic  transactions.  Invited  address  given  at  the  2”“^  Interna¬ 
tional  Conference  on  Database  Theory,  Bruges,  Belgium  (August  1988);  CWI,  Amster¬ 
dam  (August  1988);  Princeton  University  (July  1989);  Harvard  University  (April 
IBM  Almaden  (April  1989). 


N.  Lynch.  Modeling  real  time  systems.  Lecture  given  at  ONR  Workshop  on  Real  Time 
Systems.  Fall  Church,  VA,  November  1988. 


N.  Lynch.  Three  impossibility  results  for  distributed  computing.  Lecture  given  at  Dig¬ 
ital  Equipment  Corporation  (December  1988);  Memorial  University,  St.  Johns,  New¬ 
foundland  (March  1989);  MIT  freshman  seminar  in  “schools  of  thought.” 

N.  Lynch.  I/O  automata  -  a  model  for  discrete  event  systems.  Lecture  given  at  LCS 
Annual  Meeting  (June  1989);  Harvard  University  (December  1988). 

N.  Lynch.  Multivalued  possibilities  mappings.  Lecture  given  at  REX  Workshop  on 
Stepwise  Refinement  of  Distributed  Systems,  May  1989. 
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M.  Tuttle.  Programming  simultaneous  actions  using  common  knowledge.  Lecture 
given  at  IBM  T.J.  Watson  Research  Center  (February),  Cornell  University  (February); 
DEC  Cambridge  Research  Laboratory  (March);  Brown  University  (March);  New  York 
University/Courant  Institute  (March);  Princeton  University  (March);  Yale  University 
(April),  1989. 
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15.1  Overview  of  the  MIT  X  Consortium 

The  MIT  X  Consortium  was  formed  in  January  1988  to  further  the  development  of  the 
X  Window  System.  The  major  goal  of  the  Consortium  is  to  promote  cooperation  within 
the  computer  industry  in  tlie  creation  of  standard  software  interfaces  at  all  layers  in  the 
X  Window  System  environment.  MIT’s  role  is  to  provide  the  vendor-neutral  architectural 
and  administrative  leadership  required  to  make  this  work.  The  Consortium  is  financially 
self-supporting,  with  membership  open  to  any  organization.  At  present,  over  60  companies 
belong  to  the  Consortium,  as  well  as  several  universities.  These  members  represent  the  bulk 
of  the  US  computer  industry,  as  well  as  considerable  segment  of  the  international  industry. 


15.2  Current  Status  and  Future  Plans 

15.2.1  Release  3 

One  of  the  primary  tasks  of  the  Consortium  staff  is  the  maintenance  and  evolution  of  a  soft¬ 
ware  distribution  containing  s  imple  implementations  of  all  interfaces  defined  by  the  Consor¬ 
tium.  as  well  as  numerous  applications  and  utilities.  In  October,  Release  3  of  this  distribu¬ 
tion,  consisting  of  26  megabytes  of  source  code,  was  made  available  to  the  world,  along  with 
a  companion  collection  of  roughly  80  megabytes  of  source  code  of  user-contributed  software. 
The  distribution  is  available  using  anonymous  FTP  from  a  number  of  Internet  sites,  and  on 
magnetic  tape  from  the  MIT  Software  Center. 

15.2.2  Configuration  Management 


Configuration  management  is  an  important  aspect  of  any  large  software  system,  and  the  X 
distribution  is  no  exception.  Development  is  performed  on  more  than  half  a  dozen  different 
platforms,  each  running  a  ditferent  operating  system,  and  the  system  is  used  externally  on 
a  variety  of  additional  platforms.  Details  of  how  to  build  and  install  the  system  vary  with 
every  operating  system  (networking  interfaces,  program  names,  compile  options,  header  and 
library  files,  etc.)  and  machine  type  (low  level  graphics  code  is  device-specific),  as  well  as 
with  site-sfiecific  preferences  (where  programs  should  be  installed,  default  values,  etc.). 

.lim  1' niton  has  done  extensive  work  during  the  past  year  on  configuration  management 
for  the  X  distribution.  Release  3  has  been  widely  praised  for  its  ease  of  portability  and 
installation,  and  significant  im [)rovemcnts  have  been  made  since  then. 

15.2.3  X  Conference 


In  .lanuary.  we  liosted  the  1  tiird  .Annual  X  Technical  Conference.  The  purpose  of  the  confer¬ 
ence  is  to  presei,!  and  discuss  leading  edge  research  and  development  in  the  X  environment 
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from  both  academia  and  industry.  This  year’s  conference  consisted  of  seven  tuloriais,  2>) 
presentations,  and  14  informal  “birds  of  a  feather”  sessions,  spread  over  three  days.  Doniiif 
Converse  and  Michelle  Leger  handled  the  bulk  of  the  organizational  details,  including  rcg 
istration,  scheduling,  catering,  conference  proceedings,  and  video  tape  coordination.  1  he 
conference  was  attended  by  approximately  1100  people,  free  of  charge,  and  was  well  re 
ceived.  Since  attendance  has  grown  every  year  and  we  reached  the  seating  capacity  of  Mid’ 
facilities,  we  expect  to  hold  next  year’s  Clonfereuce  off-campus. 

15.2,4  MIT  Research  Funding 


The  X  Consortium  membership  would  like  to  see  MIT  remain  the  leader  and  focal  point  in 
research  and  development  of  the  X  Window  System.  To  that  end,  the  X  Consortium  created 
a  funding  pool  to  encourage  MIT  faculty,  staff,  and  students  to  participate  in  X-related 
research  and  development  activities.  The  pool  for  1989  was  set  at  $250,000.  Although 
relatively  few  proposals  have  been  received,  two  projects  have  been  approved,  and  approval 
of  several  more  is  anticipated. 

15.2.5  Sample  Server  Implementation 


Keith  Packard  made  several  improvements  to  the  MIT  X  server  implementation.  Using  a 
prototype  provided  by  Adam  de  Boor  at  UC  Berkeley,  Keith  wrote  a  device-independent  im¬ 
plementation  of  backing-store  and  save-unders  (in  which  the  server  saves  portions  of  windows 
that  are  obscured  by  other  windows  so  that  some  exposures  can  be  handled  automatically  ! 
for  Release  3.  In  addition,  Keith  spent  considerable  time  and  energy  producing  an  imple¬ 
mentation  for  drawing  arcs  that  conforms  to  the  X  protocol  specification.  Although  thf 
v<  rsion  that  was  distributed  in  Release  3  was  rather  slow,  it  has  been  sped  up  significantly 
In  addition,  we  recently  received  some  very  important  enhancements  from  Joel  McCormack 
at  Digital  Equipment  Corporation  that  optimize  region  operations  and  window  hierarchy 
manipulations.  Keith  added  some  of  his  own  improvements  and  simplifications  to  this  code, 
and  integrated  it  into  our  implementation. 

Up  through  Release  3,  failure  to  allocate  memory  would  result  in  server  termination.  Since 
catastrophic  failure  is  not  a  desired  response  (particularly  in  limited  memory  environments 
such  as  X  te  miiials),  this  was  of  considerable  concern.  Bob  Scheifler  has  since  reworked  the 
server  to  survive  most  memory  allocation  failures,  reporting  errors  back  to  the  requesting 
client  in  accordance  with  the  X  protocol  specification,  and  continuing  to  operate  normally. 
The  only  failures  which  are  not  yet  bandied  adequately  are  those  occurring  during  region 
operations,  principally  during  window  hierarchy  reconfiguration.  Bob  Scheifler  and  Keith 
Packard  have  developed  a  strategy  for  gracefully  surviving  these  failures,  and  future  inqde- 
rnentation  w'ork  is  planned. 

In  addition  to  recovering  from  allocafion  failures,  overall  memory  consumption  in  the  server 
has  been  sigiiificn ntly  reduced.  Bol>  Scheifler  and  Keith  Packard  have  designed  new  data 
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structures  for  the  major  server  resources  (windows  and  graphics  contexts)  that  should  cut 
memory  size  approximately  in  half.  A  further  benefit  of  this  work  was  a  revised  strategy  for 
layering  in  the  server  (based  on  the  usual  object-oriented  strategy  of  using  wrappers  around 
methods)  that  will  considerably  simplify  the  implementations  of  backing-store  and  software 
cursors. 

15.2.6  Standard  Colormaps 


Standard  Colormaps  are  a  mechanism  that  allows  applications  to  share  commonly- used  color 
resources,  with  an  efficient  mapping  from  RGB  color  values  to  pixel  values  for  display.  Keith 
Packard  and  Donna  Converse  have  developed  an  algorithm  for  constructing  Standard  Col¬ 
ormaps  which  permits  other  applications  (those  using  the  normal  X  protocol  color  lookup 
facilities)  to  also  share  the  same  color  resources.  This  is  particularly  important  on  typical 
color  workstations  today,  which  support  only  a  single  hardware  colormap  with  a  limited 
number  of  colormap  entries.  Keith  Packard  also  developed  a  revised  algorithm  for  creating 
Standard  Colormaps  that  are  linear  ramps  through  the  RGB  cube,  which  now  allows  ap¬ 
plications  to  treat  gray  scale  and  other  linear  maps  to  be  treated  uniformly  with  all  other 
Standard  Colormaps.  Donna  Clonverse  implemented  a  set  of  routines  for  creating  Standard 
Colormaps  for  all  of  the  visual  classes  defined  by  the  X  protocol. 

15.2.7  Graphics  Benchmark 

Dan  Schmidt,  working  with  Jim  Fulton,  developed  a  graphics  benchmark  and  demonstration 
program.  The  program  provides  a  simple  interface  for  exercising  all  of  the  attributes  that  can 
affect  graphical  output  (such  as  line  style,  cap  style,  join  style,  fill  style,  dash  pattern,  and 
line  width)  in  combination  with  the  various  graphical  primitives  (such  as  points,  lines,  arcs, 
and  text).  In  addition  to  providing  a  means  for  obtaining  performance  data,  this  program 
will  also  be  a  valuable  interactive  demonstration  of  the  X  graphics  model. 

15.2.8  Xt  Intrinsics 


.‘\  major  accomplishment  in  the  past  year  has  been  standardization  (within  the  Consortium) 
of  the  Xt  Intrinsics,  an  object-oriented  foundation  for  building  user  interface  toolkits.  Such 
toolkits  (including  the  .MIT  Athena  Widgets  set)  are  available  from  a  growing  number  of 
vendor.s. 

Work  on  the  Intrinsics  is  far  from  finished,  however.  As  vendors  have  begun  to  use  the 
fiitrinsics  in  earnest,  a  number  of  deficiencies  (in  both  function  and  performance)  have  been 
identified.  In  particular,  using  a  window  for  every  user  interface  component  (called  “wid¬ 
gets”)  caused  concern  over  the  amount  of  memory  used  in  both  the  client  (where  the  toolkit 
residf's)  and  in  the  server.  As  a  result,  a  proposal  for  windowless  widgets  (originally  designed 
and  implemented  by  Digital  K(pii[mient  (Corporation,  now  used  by  a  number  of  companies) 
is  currently  uiifier  review  within  the  Consortium,  under  the  overall  guidance  of  Ralph  Swick 
of  MIT  Project  .Athena. 
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15.2.9  Athena  Widgets 

Chris  Peterson  has  been  doing  considerable  work  fixing  and  enhancing  the  Athena  Widget 
set,  which  is  used  in  a  growing  number  of  our  core  applications.  The  most  significant  recent 
addition  is  a  long-awaited  menu  widget,  supporting  both  pulldown  and  popup  menus.  Tliis 
widget  has  now  replaced  several  incomj)atible  menu  implementations  in  our  distribution  and 
is  expected  to  be  widely  used. 

15.2.10  Core  Components 

There  are  now  several  product-quality  widget  sets  built  on  top  of  the  Xt  Intrinsics.  Although 
these  widget  sets  all  provide  remarkably  similar  functionality,  they  differ  considerably  in  their 
P,raphical  user  interface  (appearance  and  behavior)  and  their  programmatic  interface.  For 
the  programmers  who  wish  to  have  their  applications  blend  in  with  the  other  applications 
on  a  given  vendor’s  platform,  the  ability  to  easily  retarget  a  given  application  to  more  than 
one  graphical  user  interface  is  crucial.  Unfortunately,  wdth  present  toolkits,  the  differences 
in  programmatic,  interfaces  makes  this  task  quite  difficult  in  practice. 

The  Core  Components  effort  is  an  attempt  to  specify  a  policy-free  application  programmer’s 
interface,  that  would  permit  different  implementations  to  embody  disparate  graphical  user 
interface  policies,  transparent  to  the  application  programmer.  Dana  Laursen  of  Hewlett- 
Packard  is  the  chief  architect  of  the  Core  Components.  Ralph  Swick  of  MIT  Project  Athena 
and  Bob  Scheifier  have  contributed  to  the  fundamental  architecture,  and  a  variety  of  engi¬ 
neers  in  several  Consortium  organizations  have  contributed  to  the  design.  Although  consid¬ 
erable  progress  has  been  made,  a  number  of  very  hard  problems  still  exist,  such  as  how  to 
permit  subclassing  without  exposing  functionality  that  is  specific  to  a  particular  graphical 
user  interface.  The  time  and  resource  investment  required  to  complete  the  research  now 
appears  to  be  too  large  for  many  Consortium  organizations,  who  must  focus  in  the  short 
term  on  shipping  initial  toolkit  products. 

15.2.11  Inter-client  Communication  Conventions  Manual 


The  Inter-client  Communication  Conventions  Manual  (ICCCM)  establishes  policies  covering 
a  number  of  mechanisms  in  the  X  protocol  in  order  to  allow  applications  from  independent 
vendors  to  coexist  and  cooperate  in  the  X  environment.  The  ICCCM  covers  the  use  of  the 
selection  mechanism  for  peer-to-peer  data  exchange  (e.g.  in  cut  and  paste  operations),  client 
to  window  manager  communication  (for  dealing  with  title  bars,  icons,  geometry,  input  focus, 
colormaps,  etc.),  client  to  ses.sion  manager  communication  (for  client  checkpoint  and  window 
deletion),  and  client  manipulation  of  keyboard  and  pointing  device  attributes.  The  overall 
architect  for  the  ICCCM  has  been  David  Rosenthal  of  Sun  Microsystems,  with  consideralde 
input  from  engineers  in  vanous  Clonsortium  organizations  and  all  of  the  MIT  staff.  I  h<' 
document  is  now  out  for  its  second  public  review,  and  a  final  standard  will  be  produced 
shortly  Jim  Fulton  designed  and  implemented  the  Xlib  changes  required  to  support  the 
ICICCM,  and  these  changes  are  also  now  out  for  public  review. 
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15.2.12  X  Display  Manager 


Keith  Packard  produced  XDM,  the  X  Display  Manager,  for  Release  3.  This  daemon  manages 
a  collection  of  X  displays,  including  X  terminals,  on  a  given  host.  It  provides  for  authenti¬ 
cating  a  user  at  a  display  (i.e.,  login)  and  running  the  user’s  session.  Although  designed  to 
work  with  both  hardware  displays  local  to  the  host  and  with  remote  X  terminals,  a  control 
protocol  is  required  to  make  X  terminals  work  well.  In  particular,  when  an  X  terminal  is 
powered  on  or  reset,  a  mechanism  is  needed  to  inform  the  host’s  XDM  daemon  that  the 
terminal  is  now  up,  so  that  login  can  be  initiated.  Keith  Packard  and  Bob  Scheifler  have 
designed  the  X  Display  Manager  Control  Protocol  (XDMCP)  for  this  purpose.  The  protocol 
also  deals  with  network  security  issues,  and  permits  centralized  configuration  management 
in  an  environment  with  a  large  number  of  terminals  and  potential  login  hosts.  The  protocol 
is  currently  under  review  within  the  Consortium. 

15.2.13  Security 


Security  has  tong  been  a  low  priority  issue  in  the  X  world.  However,  with  the  increase  in 
commercial  X  products,  and  with  the  rash  of  computer  break-ins  and  viruses  over  the  past 
year,  interest  in  security  is  now  rather  high.  The  default  host-based  access  control  mechanism 
in  the  core  X  protocol  is  simply  not  adequate  in  most  environments.  Keith  Packard  and  Jim 
Fulton  have  designed  and  implemented  a  basic  framework  for  allowing  X  clients  to  send 
authorization  information  to  the  X  server,  and  Keith  Packard  put  a  simple  encryption-based 
authorization  scheme  into  the  X  Display  Manager  and  the  X  server  as  a  test.  A  more  secure 
scheme  is  being  worked  on  as  part  of  the  X  Display  Manager  Control  Protocol,  and  work  is 
ongoing  at  MIT  Project  Athena  to  integrate  Kerberos  as  an  authorization  mechanism. 

15.2.14  Compound  Text 


Internationalization  (or  localization)  of  user  interface  software  is  increasingly  important  to  X 
vendors.  .4  key  aspect  of  this  is  dealing  with  text  in  languages  other  than  English.  There  are 
three  important  uses  of  text  in  the  X  environment  that  are  external  to  applications:  inter¬ 
client  communication  using  selections  (e.g.  cut  and  paste);  window  properties  (e.g.  text  for 
title  bars);  and  resources  (e.g.  text  for  labels  and  prompts).  Typically,  different  languages 
have  different  character  sets,  and  each  character  set  is  given  a  particular  encoding  (usually 
one  or  two  bytes  per  character j.  In  some  cases,  the  characters  used  for  a  single  language  are 
^plit  across  more  than  one  character  set  encoding. 


Bob  Scheifler  developed  a  format  for  multiple  character  set  data,  called  Compound  Text, 
based  on  ISO  standards  for  em  oding  and  combining  character  sets  (ISO  2022  and  ISO  6429). 
(  ompf)unfl  7’ext  is  intended  to  be  an  external  representation,  or  interchange  format,  for  use 
in  tlie  three  areas  li.stcd  al)'Cv<'  It  is  not  intended  to  be  an  internal  representation  within 
an  application;  it  is  expected  (but  not  required)  that  clients  will  convert  Compound  Text 
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to  some  internal  representation  for  processing  and  rendering,  and  convert  from  that  internal 
representation  to  Compound  String  when  providing  textual  data  to  another  client.  The 
format  supports  the  standard  ISO  8859  character  set  encodings  and  the  standard  Japanese, 
Chinese,  and  Korean  character  set  encodings,  and  encourages  their  use,  but  also  allows 
non-standard  encodings  to  be  used.  Horizontal  direction  of  text  can  also  be  encoded.  The 
Compound  Text  specification  is  now  out  for  public  review. 

15.2.15  X  Logical  Font  Description  Conventions 


Jim  Flowers  of  Digital  Equipment  has  been  the  chief  architect  of  the  X  Logical  Font  Descrip¬ 
tion  Conventions,  which  establish  a  standard  parsable  font  name  format  and  standard  font 
properti  >s,  providing  X  clients  a  server-independent  means  to  query  and  use  a  rich  collection 
of  fonts.  For  example,  the  conventions  provide  an  adequate  set  of  typographic  font  attributes 
for  publishing  and  other  applications  to  do  intelligent  font  matching  or  substitution  when 
handling  documents;  automatically  place  subscripts  and  superscripts;  and  determine  small 
capital  heights,  recommended  leading,  and  wordspace  values.  Bob  Scheifler  contributed  to 
the  design,  and  Jim  Fulton  worked  to  ensure  that  fonts  contributed  to  MIT  and  font  support 
mechanisms  (such  as  the  font  compiler)  conform  to  the  conventions.  The  conventions  are 
now  out  for  public  review  prior  to  finalizing  tlm  '-■‘•'ndard. 

15.2.16  Font  Server 


With  the  advent  of  X  terminals  and  other  limited- memory  servers,  and  the  significant  in¬ 
crease  in  quality  screen  fonts,  the  idea  of  a  font  server  is  now  attractive.  A  font  server  is  a 
program  for  providing  font  information  to  client  programs  (such  as  X  servers,  or  even  print 
servers  and  document  previewers).  Like  the  X  server,  the  font  server  will  be  able  to  simulta¬ 
neously  support  multiple  clients  of  differing  architectures  over  any  virtual  stream  connect!''':' 
The  font  server  should  be  able  to  deal  with  multiple  font  (input  and  output)  formats,  and 
provide  partial  font  information  (so  that  limited  client  memory  can  be  used  as  a  cache). 
The  font  serv'er  must  also  be  able  to  enforce  licensing  restrictions  on  a  per-font  basis.  Tom 
Porcher  of  Digital  Equipment  formulated  a  requirements  document  for  font  service,  and  Jim 
Fulton  put  together  a  preliminary  design  for  a  font  server  protocol.  Daniel  Dr'dailler  of 
Bull  is  interested  in  pursuing  the  design  and  implementation  to  completion,  and  we  hope  for 
more  progress  next  year. 

15.2.17  PEX  Sample  Implementation 

PEX  is  a  3D  graphics  extension  to  the  X  protocol,  supporting  the  PHIGS  and  PlIICS  t 
graphics  interfaces.  In  order  to  establish  proof  of  concept  of  the  PEX  design,  and  to  promote 
the  use  of  PEX,  an  effort  is  now  underway  t(>  build  a  sample  implementation  of  both  the  PEX 
server  extension  and  a  full  client  library  (providing  the  PHIGS  and  PHIGS+  programming 
interface).  Bob  Scheifler,  working  with  several  interested  companies,  put  together  a  Request 
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For  Proposals.  A  number  of  bids  were  received,  and  one  was  selected  based  on  input  from 
potential  sponsors.  There  are  now  15  sponsors  of  the  implementation  which  is  scheduled  for 
release  to  the  public  in  the  early  spring  of  1991. 

15.2.18  Input  Device  Extension 


The  core  X  protocol  deals  only  with  one  keyboard  and  one  pointing  device.  However,  many 
workstations  have  an  assortjne;it  of  input  devices  which  would  be  very  useful  to  X  applica¬ 
tions.  George  Sach.^  of  Hewlett-Packard  and  Mark  Patrick  of  Ardent  Computer  produced  an 
X  protocol  extension  and  a  C  library  for  dealing  with  additional  input  devices.  The  primary 
devices  supported  are  those  with  keys,  buttons,  and  one  or  more  axes  of  motion.  In  addition, 
the  extension  is  designed  to  itself  be  extensible,  so  that  new  classes  of  input  devices,  and  new 
combinations  of  classes,  can  easily  be  added.  Bob  Scheifler  and  Keith  Packard  contributed 
to  the  design,  as  have  engineers  from  a  number  of  Consortium  organizations. 

15.2.19  Video  Extension 


Todd  Brunhoff  of  Tektronix  has  been  working  on  an  X  protocol  extension  to  provide  an  X 
interface  to  the  generally  interesting  aspects  of  displaying  Hve  video  in  windows,  capturing 
graphics  from  windows  and  converting  them  to  a  video  signal,  and  managing  the  network 
of  connections  to  and  from  devices  that  may  receive  or  produce  these  signals,  such  as  video 
tape  recorders. 

15.2.20  Multi-buffering  and  Stereo  Extension 


Jeff  Fried  berg  and  Larry  Seiler  of  Digital  Equipment,  and  Jeff  Vroom  of  Stellar  Computer, 
worked  to  merge  several  different  double- buffering  proposals  into  single  coherent  X  protocol 
extc:isi‘'ui  for  .supporting  multi-buflering  and  stereoscopic  viewing  of  windows.  The  extension 
allows  multiple,  independently-addressable  output  buffers  to  be  associated  with  normal  and 
stereo  windows.  Any  of  the  buffers  can  be  displayed  in  the  window,  and  a  series  of  buffers 
can  be  displayed  in  rapid  succession  to  achieve  a  smooth  animation.  Keith  Packard  and  Bob 
Scheifler  contributed  to  the  design,  which  is  now  out  for  public  review. 

15. 2. 21  Nonrectangular  Windows 


Keith  Packard  designed  and  iniplemented  an  X  protocol  extension  for  changing  the  visible 
shape  of  a  winrlow  to  an  arbitrary  nonrectangular  shape,  including  forms  composed  of  disjoint 
pieces.  Each  window  is  defined  lyy  two  regions:  the  bounding  region  and  the  clip  region.  The 
1)011  luling  region  is  the  area  of  tlie  parent  window  which  the  window  will  occupy  (including 
border).  1  he  dip  region  is  the  subset  of  the  bounding  region  which  is  available  for  sub- 
windows  and  graphics,  d  lu-  aroi  between  the  bounding  region  and  the  clip  region  is  defined 
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to  be  the  bofiler  of  the  window.  The  extension  proved  remarkably  simple  to  implement  in 
the  server,  with  changes  requiied  in  just  a  few  places.  Th.  added  functionality  imposes  no 
cost  penalty  on  rectangular  windows,  and  performance  for  nonrectangular  windows  is  quite 
acceptable.  The  extension  is  now  under  review  within  the  Consortium. 

15.2.22  X  Testing  Consortium 


The  X  Testing  Consortium  is  a  loosely  bound  group  of  approximately  a  dozen  companies 
working  together  on  comprehensive  test  software  for  the  X  protocol  and  the  Xlib  C  language 
interface  to  it.  Bob  Scheifler  has  been  meeting  with  this  group  over  the  past  year,  providing 
some  guidance.  The  output  of  the  group  will  be  both  informal  test  specifications,  and 
code  implementing  those  specifications.  The  group  also  did  some  work  on  performance 
benchmarks,  and  Jim  Fulton  provided  review  and  feedback  on  that  work.  Producing  a 
complete  test  suite  proved  to  require  considerably  more  effort  than  expected,  and  various 
companies  are  now  pulling  resources  off  the  project  to  work  on  other  tasks.  The  group 
will  be  producing  a  final  but  incomplete  release  soon.  Bob  Scheifler  is  working  on  plans  to 
continue  the  effort  within  the  X  Consortium,  and  to  expand  the  testing  effort  to  cover  other 
components  of  the  X  Window  System. 

15.2.23  Formal  Standards 


The  X3H3.6  Window  System  Task  Group,  under  the  X3H3  Computer  Graphics  Standards 
Committee,  under  ANSI,  has  been  working  on  formal  standardization  of  the  X  protocol  for 
^  about  two  years  now.  Bob  Scheifler  has  been  attending  the  X3H3.6  meetings  and  participat¬ 

ing  in  their  deliberations.  The  protocol  specification  has  recently  gone  out  for  letter  ballot 
within  X3H3,  but  an  ANSI  standard  is  still  perhaps  fourteen  months  away.  The  task  group 
had  planned  to  start  work  on  Xlib  by  now,  but  lack  of  resources  have  made  that  impractical. 
Both  the  task  group  and  the  X  Consortium  have  recently  approved  a  working  relationship, 
in  which  X3H3.6  will  ask  the  X  Consortium  to  develop  resolutions  to  technical  issues  as  they 
arise  during  the  remainder  of  the  ANSI  process. 

The  IEEE  Technical  Committee  on  Operating  Systems  formed  a  new  working  group,  P1201.1 , 
to  formally  standardize  (under  ANSI)  toolkit  functionality  and  behavior  in  the  X  environ¬ 
ment.  Bob  Scheifler  and  Chris  Peterson  have  to  date  shared  responsibilities  for  interacting 
with  this  group.  Their  immediate  goal  appears  to  be  standardization  of  a  widget  set  based 
on  the  Xt  Intriii.sics,  but  to  do  so  requires  that  the  Intrinsics  and  Xlib  be  standardized.  The 
group  does  not  currently  have  the  resourres  to  do  this,  and  is  coordinating  with  X3H3.6  in 
an  attempt  to  develop  a  workable  plan. 

The  National  In-stitute  of  Standards  and  Technology  issued  a  proposed  Federal  Information 
Processing  Standard  for  the  X  Window  System,  composed  of  the  standard  specifications  in 
Release  3  of  the  X  Consortium  soft  ware  distribution:  the  X  protocol,  the  Xlib  C  binding,  the 
Xt  Intrinsics,  and  the  Bitmap  Distribution  Format  for  fonts.  The  fact  that  NIST  issued  this 
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FIPS  without  waiting  for  formal  standardization  by  ANSI  caused  considerable  consternation 
in  the  formal  standards  world,  and  the  fact  that  the  FIPS  appears  to  make  the  Intrinsics  an 
exclusive  federal  standard  (the  Intrinsics  are  a  non-exclusive  standard  in  the  X  Consortium) 
caused  considerable  consternation  on  the  part  of  certain  X  vendors.  Bob  Scheifler  has  been 
working  with  all  of  these  players  to  ensure  that  technical  arguments  are  correct,  and  to  try 
to  keep  the  political  arguments  in  perspective. 

15.2.24  Registration 


We  established  a  mechanism  to  allow  the  X  community  to  register  the  following  significant 
items:  organization  names  (used  as  prefixes  for  other  names),  keysyms,  authorization  pro¬ 
tocol  names,  vendor  server  string  formats,  protocol  extension  names,  host  address  family 
formats,  window  property  names  and  types,  selection  names  and  targets,  window  manager 
protocols,  font  foundry  names,  font  property  names,  resource  types,  and  application  classes. 
The  primary  goal  of  registration  is  to  avoid  conflicting  use  of  a  given  name  or  value.  The 
secondary  goal  is  to  encourage  use  of  these  items  by  more  than  one  organization. 

15.2.25  Graphical  Programming  Environment  for  Configuration 


Geeta  Khare  explored  the  specification,  design,  and  implementation  of  a  graphical  pro¬ 
gramming  language  to  address  a  class  of  problems  known  as  configuration  problems,  those 
concerned  with  the  correctness  and  completeness  of  a  collection  of  items  and/or  the  ar¬ 
rangement  of  those  items  under  a  set  of  constraints.  Domain  specificity  of  the  language 
is  achieved  through  the  types  and  operations  provided.  A  configuration  task  grammar  was 
produced,  which  lists  the  breakdown  of  configuration  tasks  into  subtasks  and  primitives,  and 
a  categorization  was  produced  of  objects  found  in  configuration  problems,  allowing  reuse  of 
primitives  in  different  configuration  problems.  A  prototype  was  implemented  in  Common 
Lisp  using  KnowledgeCraft. 
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